Use cases

What people build with NEO

Real projects built by NEO — from LLM benchmarks to agent swarms. Pick a workflow below to browse, or start with a featured use case.

Featured

150+ tasks, 10 categories

Evaluate & Benchmark

Benchmarking LLMs on Real Tasks

An async LLM benchmarking platform that evaluates models from OpenAI, Anthropic, Google, and more across 150+ real-world tasks covering coding, reasoning, structured output, and...

View walkthrough View demo

Dual-LLM optimization loop

Evaluate & Benchmark

Auto prompt optimization

Closed-loop system: an optimizer LLM writes prompts and reads failure summaries, a target LLM runs batches against synthetic data, and a JSON ledger tracks every iteration until scores converge.

View walkthrough View demo

+4.62% returns, 10 agents

Build Agents

Trading Agent Swarm

10 specialized agents coordinating over async message bus: +4.62% returns across 250 days of S&P 500 data.

View walkthrough View demo

Browse by workflow

Same stack you're already debugging

Agents with brittle tool calls. Prompts that need another pass. Evals before you trust a model swap. NEO lives in VS Code or Cursor and helps you turn that work into real code and runs, so you iterate on behavior, not boilerplate.

Get started