HoML CLI Documentation
Install HoML
Go to the download page to get the HoML CLI for your system. Once installed, run the following command to set up the HoML server:
homl server install
Pull a model from Hugging Face Hub
Download a model to your local machine. You can use a shorthand alias for curated models.
homl pull qwen3:0.6b
Or use the full Hugging Face model ID:
homl pull Qwen/Qwen3-0.6B
Run a model
Run a downloaded model. This will start the model and make it available for chat and API access.
homl run qwen3:0.6b
Run a model in interactive chat mode
Start a conversation with a model.
homl chat qwen3:0.6b
List local models
List all models that are available locally.
homl list
Check running models
Check the status of models that are currently running.
homl ps
Stop a model
Stop a running model to free up resources.
homl stop qwen3:0.6b
Automatic GPU Memory Management
HoML is designed to manage your GPU resources efficiently. When you make a request to the OpenAI-compatible API for a specific model, HoML automatically loads it into memory. Currently, only one model can run at a time. If you make a request for a different model, HoML will unload the previous one and load the new one.
To free up your GPU for other applications, models are automatically unloaded after a period of inactivity. The default idle timeout is 10 minutes. You can configure this timeout using the homl config set model_unload_idle_time <seconds>
command.
Authenticate with Hugging Face
Set your Hugging Face token to pull private or gated models. You can provide the token directly or load it automatically from the default Hugging Face cache.
homl auth hugging-face <your-token>
Or load it automatically:
homl auth hugging-face --auto
Manage HoML Server
You can manage the HoML server with the following commands:
homl server stop
homl server restart
homl server log
Manage HoML Configuration
You can manage the HoML configuration with the following commands:
homl config list
Get a config value:
homl config get port
Set a config value:
homl config set port 8080