Skip to main content
HUD gives you three things: a unified API for every model, a way to turn your code into agent-callable tools, and infrastructure to run evaluations at scale.

Install

# Install CLI
uv tool install hud-python --python 3.12

# Set your API key
hud set HUD_API_KEY=your-key-here
Get your API key at hud.ai/settings/api-keys.

1. Gateway: Any Model, One API

Stop juggling API keys. Point any OpenAI-compatible client at inference.hud.ai and use Claude, GPT, Gemini, or Grok:
from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

response = await client.chat.completions.create(
    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro, grok-4-1-fast...
    messages=[{"role": "user", "content": "Hello!"}]
)
Every call is traced. View them at hud.ai/home. More on Gateway

2. Environments: Your Code, Agent-Ready

A production API is one live instance with shared state—you can’t run 1,000 parallel tests without them stepping on each other. Environments spin up fresh for every evaluation: isolated, deterministic, reproducible. Each generates training data. Turn your code into tools agents can call. Define scenarios that evaluate what agents do:
from hud import Environment

env = Environment("my-env")

@env.tool()
def search(query: str) -> str:
    """Search the knowledge base."""
    return db.search(query)

@env.scenario("find-answer")
async def find_answer(question: str):
    answer = yield f"Find the answer to: {question}"
    yield 1.0 if "correct" in answer.lower() else 0.0
Scenarios define the prompt (first yield) and the scoring logic (second yield). The agent runs in between. More on Environments

3. Evals: Test and Improve

Run your scenario with different models. Compare results:
import hud

task = env("find-answer", question="What is 2+2?")

async with hud.eval(task, variants={"model": ["gpt-4o", "claude-sonnet-4-5"]}, group=5) as ctx:
    response = await client.chat.completions.create(
        model=ctx.variants["model"],
        messages=[{"role": "user", "content": ctx.prompt}]
    )
    await ctx.submit(response.choices[0].message.content)
Variants test different configurations. Groups repeat each to see the distribution. Results show up on hud.ai with scores, traces, and side-by-side comparisons. More on A/B Evals

4. Deploy and Scale

Push your environment to GitHub, connect it on hud.ai, and run thousands of evals in parallel. Every run generates training data.
hud init                    # Scaffold environment
git push                    # Push to GitHub
# Connect on hud.ai → New → Environment
hud eval my-eval --model gpt-4o --group-size 100
More on Deploy

Next Steps

Community

Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training pipelines. 📅 Book a call · 📧 [email protected]