Key Features
Ollama-like Experience
A simple, intuitive CLI that just works, inspired by Ollama.
High-Performance Inference
Powered by vLLM for maximum speed and throughput.
Automatic GPU Memory Management
Models load on demand and unload automatically after a configurable idle time (default 10 mins), freeing up your GPU for other tasks.
OpenAI-Compatible API
Integrate seamlessly with your existing tools and workflows.
Frequently Used Commands
Pull a model from Hugging Face Hub
Download a model to your local machine. You can use a shorthand alias for curated models.
homl pull qwen3:0.6b
Run a model
Run a downloaded model. This will start the model and make it available for chat and API access.
homl run qwen3:0.6b
Run a model in interactive chat mode
Start a conversation with a model.
homl chat qwen3:0.6b