LLaVA

LLaVA is a large multimodal model that combines a vision encoder and a language model for general-purpose visual and language understanding.

7B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:1.5-7b

Quantization	Disk Space	GPU Memory
BF16	14 GB	14 GB

13B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:1.5-13b

Quantization	Disk Space	GPU Memory
BF16	26 GB	26 GB

7B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:v1.6-mistral-7b

Quantization	Disk Space	GPU Memory
BF16	14 GB	14 GB

7B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:v1.6-vicuna-7b

Quantization	Disk Space	GPU Memory
BF16	14 GB	14 GB

13B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:v1.6-vicuna-13b

Quantization	Disk Space	GPU Memory
BF16	26 GB	26 GB

34B parameters Multimodal

Use the following command with the HoML CLI:

homl pull llava:v1.6-34b

Quantization	Disk Space	GPU Memory
BF16	68 GB	68 GB