Name	Name	Last commit message	Last commit date
parent directory ..
deepagents_evals	deepagents_evals
deepagents_harbor	deepagents_harbor
scripts	scripts
tests	tests
AGENTS.md	AGENTS.md
CONTRIBUTING.md	CONTRIBUTING.md
EVAL_CATALOG.md	EVAL_CATALOG.md
MODEL_GROUPS.md	MODEL_GROUPS.md
Makefile	Makefile
README.md	README.md
pyproject.toml	pyproject.toml
uv.lock	uv.lock

Name

Last commit message

Last commit date

Deep Agents Evals

End-to-end behavioral evaluation suite for the Deep Agents SDK. Each eval runs an agent against a real LLM, captures the full trajectory (tool calls, file mutations, final response), and scores it on correctness and efficiency.

See EVAL_CATALOG.md for the full list of evals and categories, and MODEL_GROUPS.md for the model catalog used by the eval workflow.

The suite also includes Harbor integration for running sandboxed benchmarks like Terminal Bench 2.0.

Results

Suite	CI	LangSmith
Evals	evals.yml	deepagents-evals
Harbor	harbor.yml	deepagents-harbor

Contributing

Architecture, writing new evals, category system, Harbor setup, and LangSmith integration are all documented in CONTRIBUTING.md.

Resources

LangChain Academy — Comprehensive, free courses on LangChain libraries and products, made by the LangChain team.
Code of Conduct — community guidelines and standards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Deep Agents Evals

Results

Contributing

Resources

FilesExpand file tree

evals

Directory actions

More options

Directory actions

More options

Latest commit

History

evals

Folders and files

parent directory

README.md

Deep Agents Evals

Results

Contributing

Resources