Getting Started

Getting Started with HoML

This guide will walk you through setting up HoML, running your first model, and interacting with it through the command line and the OpenAI-compatible API.

Step 1: Install HoML

First, head over to the download page to get the HoML CLI for your system. Once it's installed, you need to set up the HoML server. This one-time command will download and configure the necessary components.

homl server install

This will start a OpenAI compatible API server on the port you configured, default to 7456

Even better: Install the Open WebUI (since v0.2.2)

homl server install --webui

This will start a open-webui server on port 7457

Step 2: Pull a Model

Next, download a model to run on your local machine. We'll use a small, efficient model for this example. The server will start automatically in the background.

homl pull qwen3:0.6b

Step 3: Chat with Your Model

If you installed with the --webui option, head over to http://localhost:7457 to access the web interface.

You also can start a conversation directly from your terminal. This is the easiest way to interact with your model.

homl chat qwen3:0.6b

Step 4: Use the OpenAI-Compatible API

HoML exposes an API that is compatible with OpenAI's tools and libraries. The server runs by default on port 7456. You don't need to run the model separately; the server loads it automatically when it receives a request.

Using `curl`

You can send a request to the API using curl from your terminal.

curl -X POST http://localhost:7456/v1/chat/completions \
-H "Content-Type: application/json" \
-d 
{
  "model": "qwen3:0.6b",
  "messages": [
    {
      "role": "user",
      "content": "Explain the importance of low-latency LLMs"
    }
  ]
}

Using Python

You can also use the official OpenAI Python client to interact with the API. First, install the library:

pip install openai

Then, use the following Python script:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:7456/v1',
    api_key='homl' # required, but unused
)

response = client.chat.completions.create(
    model="qwen3:0.6b",
    messages=[
        {"role": "user", "content": "Explain the importance of low-latency LLMs"}
    ]
)

print(response.choices[0].message.content)

Step 5: Use the OpenAI-Compatible Completion API

HoML also supports the standard OpenAI /v1/completions endpoint for text completion tasks. You can use this endpoint with tools like curl or the OpenAI Python client.

Using `curl`

Send a completion request from your terminal:

curl -X POST http://localhost:7456/v1/completions \
-H "Content-Type: application/json" \
-d 
{
  "model": "qwen3:0.6b",
  "prompt": "What is the capital of France?",
  "max_tokens": 32
}

Using Python

You can also use the OpenAI Python client for completions:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:7456/v1',
    api_key='homl' # required, but unused
)

response = client.completions.create(
    model="qwen3:0.6b",
    prompt="What is the capital of France?",
    max_tokens=32
)

print(response.choices[0].text)

Getting Started with HoML

Step 1: Install HoML

Step 2: Pull a Model

Step 3: Chat with Your Model

Step 4: Use the OpenAI-Compatible API

Using curl

Using Python

Step 5: Use the OpenAI-Compatible Completion API

Using curl

Using Python

Using `curl`

Using `curl`