a2a-crews

One command. AI designs the team. Agents write the code.

You describe the task. The AI planner reads your codebase, assesses feasibility, designs a custom team, and launches agents that build it — in parallel, with tests. What used to take hours of prompt engineering happens in one command.

Requires: Bun · GitHub Copilot CLI · Windows Terminal or tmux

$ crews plan "Build a search ranking classifier"

  ╔══════════════════════════════════════════════╗
  ║  PLANNING                                    ║
  ╚══════════════════════════════════════════════╝

  ⚠️  RISKY (60%) — 4 roles, 4 tasks

  Concerns:
    • Position bias in training data may skew rankings
    • No existing evaluation harness — need to build from scratch

  ROLES
    data-engineer:    Feature pipeline + data profiling
    ranking-modeler:  ONNX model + hyperparameter search
    api-integrator:   Serving endpoint + integration
    evaluator:        NDCG/MRR metrics + A/B test plan

  TASKS
    ⏳ profile-data     → data-engineer   [pending]
    ⏳ train-model      → ranking-modeler  [pending]  (← profile-data)
    ⏳ build-endpoint   → api-integrator   [pending]  (← train-model)
    ⏳ evaluate          → evaluator        [pending]  (← build-endpoint)

  ✅ Plan written. Run `crews apply` to create the team.

That's not a template. The AI planner read the codebase, understood it's an ML problem, invented four domain-specific roles, and flagged position bias as a real risk. No other tool does this.

Why a2a-crews?

	a2a-crews	CrewAI	LangGraph	AutoGen
AI planner reads codebase	✅ Via Copilot CLI	❌	❌	❌
Auto-generates team from task	✅ Or use 13 presets	⚠️ You define roles	⚠️ You build graph	⚠️ You configure
Pre-flight feasibility gate	✅ Heuristic scoring	❌	❌	❌
A2A protocol (SDK types)	✅ `@a2a-js/sdk`	❌	❌	❌
Agent-to-agent messaging	✅ Bridge relay	❌ File-based	❌	⚠️ Chat
Cost & token tracking	✅ Per-agent + budget	❌	❌	❌
Human-in-the-loop	✅ `input-required` state	⚠️ Manual	⚠️ Interrupt	⚠️ Manual
Auto-retry + recovery	✅ Exponential backoff	⚠️ Basic	❌	❌

The difference: You describe the task. a2a-crews spawns an AI planner (via Copilot CLI) that explores your repo, understands the domain, and designs a team with feasibility checks — before spending tokens on execution.

Quick Start

# Install (one command)
bun install -g a2a-crews

crews plan "Build a REST API with auth and tests"
crews apply
crews launch

# Or target a different project directory
crews -d /path/to/project plan "Build a dashboard"

Agents spawn in parallel terminal tabs, coordinate via A2A protocol, and deliver working code. Watch with crews watch. Stop with crews stop. Agents can message each other, report errors, and request human input — all through the bridge.

See It Adapt

The same tool. Different intelligence for each problem.

Simple task → preset template:

$ crews plan "Build a calculator"

  ✅ GO (85%) — feature template
  🏗️ Architect → 💻 Coder → 🔍 Reviewer
  Waves: design → implement → review

Complex task → AI-generated team:

$ crews plan "Build a fullstack dashboard with real-time updates"

  ✅ GO (85%) — fullstack template
  🏗️ Architect → ⚙️ Backend ║ 🎨 Frontend → 🔍 Reviewer
  Waves: design → backend + frontend (parallel!) → review

Ambiguous task → AI planner invents roles + flags risks:

$ crews plan "Build a classifier for search ranking"

  ⚠️ RISKY (60%) — AI-generated, 4 custom roles
  🔧 Data Engineer → 📊 Ranking Modeler → 🔌 API Integrator → 📈 Evaluator
  Concerns: position bias, no eval harness, needs domain expertise

The AI Planner

This is what makes a2a-crews different. When no template fits, the AI planner takes over.

Explores your codebase — reads file tree, package.json, source files, README
Understands the domain — ML vs API vs frontend vs infrastructure
Creates custom roles — not "architect/coder/reviewer" but "data-engineer/ranking-modeler/evaluator"
Assesses feasibility — flags real concerns with confidence scores

Every plan gets a three-factor gate before a single token is spent:

Factor	What it checks
Technical	Dependencies, ESM config, complexity keywords
Scope	Scenario size, codebase scale, feature splitting risk
Risk	Test coverage, git safety, security-sensitive keywords

Verdicts: GO (>80%) · RISKY (50-80%) · NO-GO (<50%, won't proceed)

13 Templates

Preset team compositions. Or let the AI planner create a custom team.

Category	Templates
Engineering	`feature` · `fullstack` · `bugfix` · `refactor` · `harness`
Data Science	`data-science` · `ml-experiment` · `data-pipeline`
Operations	`audit` · `ship` · `sprint` · `research`
Docs	`doc-review`

How It Works

graph LR
    A["📝 Describe task"] --> B["🔍 AI plans team"]
    B --> C["🌉 A2A Bridge"]
    C --> D["⚙️ Agent 1"]
    C --> E["⚙️ Agent 2"]
    C --> F["⚙️ Agent 3"]
    D --> G["📦 Working Code"]
    E --> G
    F --> G

crews plan — Describe what you want. AI assesses feasibility and composes a team.
crews apply — Review the plan. Approve or tweak.
crews launch — Agents spawn in terminal tabs, register with the A2A bridge, execute in waves.
crews watch — Stream status via SSE. crews stop to cancel.

What Gets Built

Example scenarios the planner handles — from simple presets to AI-generated custom teams:

Scenario	Team generated	Execution
`"Build a calculator"`	3 roles (preset: feature)	3 sequential waves
`"Build a fullstack dashboard"`	4 roles (preset: fullstack)	Parallel backend + frontend
`"Build a search ranking classifier"`	4 custom AI-generated roles	4 waves, feasibility warnings

The framework handles planning, spawning, coordination, and retry. Your agents do the coding.

Under the Hood — Real A2A Protocol

A2A Protocol Implementation

Built on the official @a2a-js/sdk (v0.3.13) from Google's A2A project.

All types come from the SDK: AgentCard, Message, Task, Part, Artifact, TaskState.

JSON-RPC Methods (10 total)

Method	Description	Streaming
`message/send`	Send a message, get a Task back	No
`message/stream`	Send + SSE task updates	✅
`message/relay`	Route message between agents	No
`messages/poll`	Check agent inbox for messages	No
`tasks/get`	Retrieve task state + history + artifacts	No
`tasks/list`	Query tasks with filters + pagination	No
`tasks/cancel`	Cancel a running task	No
`tasks/subscribe`	Subscribe to live task updates	✅

Plus REST endpoints: agent registration, heartbeat, error reporting, task CRUD.

Agent Discovery

GET /.well-known/agent-card.json
→ AgentCard { protocolVersion: "0.3.0", skills, capabilities, ... }

Key Capabilities

Task history: Every message (user + agent) accumulated in history[], returned via tasks/get
Agent messaging: Agents relay messages through the bridge (message/relay → inbox → messages/poll)
Error reporting: POST /agents/:name/events with structured types (port_conflict, tool_failure, etc.) — fatal errors auto-fail linked tasks
Human-in-the-loop: input-required TaskState pauses tasks until user provides input, then resumes
Cost tracking: Agents report token usage on task completion. /status shows per-agent and crew-wide totals with budget limits
Bridge registry: Active bridges register at ~/.a2a-crews/active-bridges/ for cross-repo discovery
Exponential backoff retry: Base 10min timeout × 1.5^attempt × 3 when files are changing (30-67min for active agents)

Production Hardening

Rate limits (100K tasks, 1K agents, 100 SSE, 1K inbox) · Circular event log (10K) · Text truncation (1MB) · SSE auto-close on disconnect · Input validation · Standard JSON-RPC error codes · Budget exceeded events

Architecture Decisions

Every design choice is backed by research. 10 ADRs in docs/architecture-decisions.md.

ADR	Decision	Source
001	Official `@a2a-js/sdk` types	A2A project
003	CrewAI Crew/Agent/Task pattern	CrewAI (47K⭐)
005	Task lifecycle follows A2A spec	A2A proto
006	SSE streaming, not polling	A2A spec
010	Agent card per spawned agent	A2A §7

Install

git clone https://github.com/aviraldua93/a2a-crews.git
cd a2a-crews && bun install

Other install methods

# Global install
bun install -g a2a-crews

# Compiled binary
bun run build && ./crews plan "Build a calculator"

Roadmap

v0.1 — CLI, A2A bridge, 13 templates, wave orchestration, evidence recovery
v0.2 — AI planner, feasibility assessment, @a2a-js/sdk integration
v0.3 — Agent messaging, cost tracking, error reporting, input-required, exponential backoff retry, bridge registry, 149 tests
v0.4 — Review feedback loops, harness iteration
v1.0 — Web dashboard, push notifications, external agent federation

See ROADMAP.md for details.

Built on A2A Protocol · @a2a-js/sdk · Bun · CrewAI patterns

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
artifacts		artifacts
docs		docs
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
ROADMAP.md		ROADMAP.md
bun.lock		bun.lock
hero.svg		hero.svg
package-lock.json		package-lock.json
package.json		package.json
test-output.txt		test-output.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

a2a-crews

One command. AI designs the team. Agents write the code.

Why a2a-crews?

Quick Start

See It Adapt

The AI Planner

13 Templates

How It Works

What Gets Built

A2A Protocol Implementation

JSON-RPC Methods (10 total)

Agent Discovery

Key Capabilities

Production Hardening

Install

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

a2a-crews

One command. AI designs the team. Agents write the code.

Why a2a-crews?

Quick Start

See It Adapt

The AI Planner

13 Templates

How It Works

What Gets Built

A2A Protocol Implementation

JSON-RPC Methods (10 total)

Agent Discovery

Key Capabilities

Production Hardening

Install

Roadmap

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages