HoML Logo

HoML

Welcome to HoML

The easiest & fastest way to run LLMs in your home lab.

HoML

Key Features

Ollama-like Experience

A simple, intuitive CLI that just works, inspired by Ollama.

High-Performance Inference

Powered by vLLM for maximum speed and throughput.

Automatic GPU Memory Management

Models load on demand and unload automatically after a configurable idle time (default 10 mins), freeing up your GPU for other tasks.

OpenAI-Compatible API

Integrate seamlessly with your existing tools and workflows.

Frequently Used Commands

Pull a model from Hugging Face Hub

Download a model to your local machine. You can use a shorthand alias for curated models.

homl pull qwen3:0.6b

Run a model

Run a downloaded model. This will start the model and make it available for chat and API access.

homl run qwen3:0.6b

Run a model in interactive chat mode

Start a conversation with a model.

homl chat qwen3:0.6b