HoML Logo

HoML

Getting Started with HoML

This guide will walk you through setting up HoML, running your first model, and interacting with it through the command line and the OpenAI-compatible API.

Step 1: Install HoML

First, head over to the download page to get the HoML CLI for your system. Once it's installed, you need to set up the HoML server. This one-time command will download and configure the necessary components.

homl server install

This will start a OpenAI compatible API server on the port you configured, default to 7456

Even better: Install the Open WebUI (since v0.2.2)

homl server install --webui

This will start a open-webui server on port 7457

Step 2: Pull a Model

Next, download a model to run on your local machine. We'll use a small, efficient model for this example. The server will start automatically in the background.

homl pull qwen3:0.6b

Step 3: Chat with Your Model

If you installed with the --webui option, head over to http://localhost:7457 to access the web interface.

You also can start a conversation directly from your terminal. This is the easiest way to interact with your model.

homl chat qwen3:0.6b

Step 4: Use the OpenAI-Compatible API

HoML exposes an API that is compatible with OpenAI's tools and libraries. The server runs by default on port 7456. You don't need to run the model separately; the server loads it automatically when it receives a request.

Using `curl`

You can send a request to the API using `curl` from your terminal.

curl -X POST http://localhost:7456/v1/chat/completions \
-H "Content-Type: application/json" \
-d 
'{ 
  "model": "qwen3:0.6b",
  "messages": [
    { 
      "role": "user",
      "content": "Explain the importance of low-latency LLMs"
    }
  ]
}'

Using Python

You can also use the official OpenAI Python client to interact with the API. First, install the library:

pip install openai

Then, use the following Python script:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:7456/v1',
    api_key='homl' # required, but unused
)

response = client.chat.completions.create(
    model="qwen3:0.6b",
    messages=[
        {"role": "user", "content": "Explain the importance of low-latency LLMs"}
    ]
)

print(response.choices[0].message.content)

Step 5: Use the OpenAI-Compatible Completion API

HoML also supports the standard OpenAI /v1/completions endpoint for text completion tasks. You can use this endpoint with tools like curl or the OpenAI Python client.

Using curl

Send a completion request from your terminal:

curl -X POST http://localhost:7456/v1/completions \
-H "Content-Type: application/json" \
-d '
{
  "model": "qwen3:0.6b",
  "prompt": "What is the capital of France?",
  "max_tokens": 32
}'

Using Python

You can also use the OpenAI Python client for completions:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:7456/v1',
    api_key='homl' # required, but unused
)

response = client.completions.create(
    model="qwen3:0.6b",
    prompt="What is the capital of France?",
    max_tokens=32
)

print(response.choices[0].text)