Building Multi-Agent Systems the Easy Way - NVIDIA AIQToolkit
A Hands-On Guide to NVIDIA's New Agent Intelligence Toolkit
Why you should keep reading
One toolkit, any framework:
Mix LangChain, LlamaIndex, CrewAI, Semantic Kernel, or your own custom Python agents in the same workflow.
Full-stack observability & profiling:
spot token hogs and bottlenecks before your cloud bill does.
Batteries included:
OpenAI-compatible API, support for NVIDIA NIM models, built-in React/Tool-Calling/Reasoning agents, and an MCP server/client-no GPU required to start.
The news in a nutshell
NVIDIA released the Agent Intelligence (AIQ) toolkit in April 2025, which is one of their main releases since GTC 2025. It's an open-source Python library that treats every agent, tool, or workflow as a function call, enabling development teams to connect, evaluate, and optimize heterogeneous teams of AI agents without abandoning their favorite frameworks.
Think of AIQ as a universal power adapter: plug LangChain in one socket, LlamaIndex in another, and still have room for your in-house tools and agents-all working together seamlessly.
The toolkit was previously known as AgentIQ but has been renamed to NVIDIA Agent Intelligence toolkit (though the API remains fully compatible with previous releases).
"I already learned LangChain - why another toolkit?"
I understand. If you’re comfortable with LangChain, LlamaIndex, or LangGraph, you’re not alone-these frameworks have become the go-to tools for building agentic workflows and RAG pipelines.
LangChain excels at composing chains and single agents, but large-scale agentic projects quickly encounter challenges:
Enter NVIDIA AIQ Toolkit
AIQ isn’t here to replace LangChain or LlamaIndex - it’s the connective tissue that lets you use them together, profile them as a whole, and rapidly build, test, and share agentic workflows at scale. Think of it as a “universal adapter” for agentic systems: every agent, tool, or workflow becomes a function call, regardless of its origin. This means you can:
Combine LangChain, LlamaIndex, CrewAI, Semantic Kernel, or your own Python agents in a single YAML-configured workflow-no replatforming required.
Profile and debug the entire system, down to individual tool calls, with built-in tracing and OpenTelemetry integration.
Package agents as plugins, making them instantly reusable and shareable across teams or projects.
Evaluate workflows end-to-end, ensuring consistent and reliable outputs as your system evolves.
"You already have one agent coded up and you're just adding it into the configuration file and now you're getting this multi-agent system. There's no extra coding needed."
What can you build?
AIQ toolkit enables a wide range of agentic applications:
Single-agent chatbots with live internet search
Multi-framework RAG systems that combine Llama Index retrievers with LangChain agents
Alert triage agents that can remediate issues from server and networking equipment
Multi-RAG agents that orchestrate multiple retrieval pipelines for complex reasoning
MCP-compatible agents that other services can call over server-sent events
The toolkit ships with reference implementations including React agents, Tool-calling agents, and Reasoning agents for rapid development, plus examples like automated description generation for vector database collections.
Under the hood: what's supported?
All components are opt-in: you can start small with just what you need and scale up as your project grows. The toolkit is designed to be lightweight and flexible, working with your existing tech stack without requiring replatforming.
Get started in 10 commands
# 1. Install prerequisites (Git, Git LFS, uv, Python 3.11/3.12)
# 2. Clone the repository
git clone git@github.com:NVIDIA/AIQToolkit.git aiqtoolkit
cd aiqtoolkit
# 3. Initialize submodules
git submodule update --init --recursive
# 4. Fetch datasets
git lfs install
git lfs fetch
git lfs pull
# 5. Create Python environment
uv venv --seed .venv
source .venv/bin/activate
# 6. Install the toolkit (with all optional dependencies)
uv sync --all-groups --all-extras
# Or just the core: uv sync
# 7. Verify installation
aiq --version
# 8. Set API key (for NVIDIA NIM access)
export NVIDIA_API_KEY=<your_api_key>
# 9. Run a simple workflow (Hello World)
aiq run --config_file examples/simple/configs/config.yml
# 10. Join the Discord for support
The entire stack runs fine on a laptop or in WSL on Windows-no GPU required.
For LLM inference, you can connect to cloud services like build.nvidia.com, which provides hosted models including Llama 3 variants.
A minimal example
Here's a simple workflow configuration (YAML) that creates a ReactAgent with Wikipedia search capabilities:
functions:
# Add a tool to search Wikipedia
wikipedia_search:
_type: wikipedia_search
llm_type: nim_star
max_results: 2
llms:
# Define which LLM to use
nim_llm:
_type: nim
model_name: meta/llama-3.1-70b-instruct
temperature: 0.0
workflow:
# Use a React agent that 'reasons' and 'acts'
_type: react_agent
# Give it access to our Wikipedia search tool
tool_names: [wikipedia_search]
# Tell it which LLM to use
llm_name: nim_llm
# Configuration options
verbose: true
retry_parsing_errors: true
max_retries: 3
This configuration can be run with a simple command:
aiq run --config_file workflow.yaml --input "List five subspecies of Aardvarks"
For LangChain developers, here's how you can wrap existing code as an AIQ function:
from aiq.plugins.langchain import aiq_function, framework_wrappers
@aiq_function(framework_wrappers=["langchain"])
async def my_react_agent(config, builder):
tools = builder.get_tools(config.tool_names)
llm = builder.get_llm_client(config.llm_name)
agent_executor = initialize_agent(
tools, llm, agent=AgentType.REACT, verbose=config.verbose
)
return await agent_executor.arun(config.input)
Once registered, this function becomes available as a component that can be used in any workflow configuration.
Deep dive highlights
1. Full-stack profiling
AIQ's profiler tracks token usage, response timings, and hidden latencies at a granular level. This helps you identify bottlenecks and optimize system performance. The profiling data can be used with NVIDIA NIM and NVIDIA Dynamo to optimize the performance of agentic systems, enabling better business outcomes without infrastructure upgrades.
2. Framework-agnostic composition
Need to combine a LangGraph agent with a Semantic Kernel tool and a LlamaIndex retriever? AIQ makes it possible by treating each component as a function call. The Multi-Frameworks example shows how to integrate:
A RAG agent built with LlamaIndex
A research agent built with LangChain
A chitchat agent built with Haystack's pipeline
All working together in a single workflow with a supervisor agent that routes queries to the appropriate worker.
3. MCP for micro-services
Launch any workflow as an MCP server with aiq mcp
and enable other services to connect to its tools. AIQ also supports connecting to external MCP servers, creating a bidirectional ecosystem of AI services.
# Start an MCP server
aiq mcp --config_file your_workflow.yml
# List available tools on the server
aiq mcp list
4. Evaluation harness
AIQ includes built-in evaluation tools to validate and maintain the accuracy of both RAG and end-to-end workflows. This helps ensure consistency and relevance of agent responses over time-crucial for production deployments.
Case Study: Building a Multi-Framework Agent Team
Think of this as assembling an AI dream team - where each specialist handles what they do best, coordinated by AIQ's framework-agnostic approach.
Architecture Breakdown
This system combines three specialized agents through a intelligent dispatcher:
Step-by-Step Implementation
1. Prerequisites
# Install core toolkit + multi-framework example
uv pip install -e "examples/multi_frameworks[all]"
# Set required API keys (free tiers available)
export NVIDIA_API_KEY="your_nvidia_key" # From build.nvidia.com
export TAVILY_API_KEY="your_tavily_key" # From tavily.com
2. Explore the Configuration
The config.yml
acts as your team roster:
# examples/multi_frameworks/configs/config.yml
agents:
supervisor:
type: langgraph_router
workers: ["rag_agent", "research_agent", "chitchat_agent"]
rag_agent:
type: llama_index_rag
document_path: "./README.md" # Agent's knowledge base
research_agent:
type: langchain_tool
tools: [arxiv_search, web_search]
chitchat_agent:
type: haystack_pipeline
model: "gpt-3.5-turbo"
3. Run the Workflow
# Query about documentation (routes to RAG agent)
aiq run --config_file=config.yml --input "How do I evaluate agent performance?"
# Query about ML concepts (routes to Research agent)
aiq run --config_file=config.yml --input "Explain retrieval-augmented generation"
# Casual conversation (routes to Chitchat agent)
aiq run --config_file=config.yml --input "Tell me a joke about GPUs"
4. Understand the Flow
User query enters Supervisor
LangGraph analyzes query intent
Routes to appropriate specialist agent
Result returns through unified interface
Sample Output:
[Input] "What evaluation metrics does AIQ support?"
→ Router detects technical query → RAG Agent responds:
"AIQ Toolkit provides built-in evaluation for:
- Answer relevance (0-5 scale)
- Context precision/recall
- Toxicity detection
- Cost/profanity checks
View metrics in the dashboard with: aiq viz --session_id=..."
Road-map & community
NVIDIA is actively developing the AIQ toolkit with upcoming features including NeMo Guardrails integration, agentic accelerations in partnership with Dynamo, and a data feedback loop.
Connect with the AIQ toolkit community:
Join the NVIDIA Developer Discord (#agent-toolkit channel)
Attend office hours on May 15th and May 21st at 9:00 AM PT
Contribute to the open-source repository
Ready to build with AIQ?
Star the repo: github.com/NVIDIA/AIQToolkit
Join the hackathon: Running May 12-23, 2025, with the chance to win an NVIDIA GeForce RTX 5090 signed by Jensen Huang
Key Benefits for Hackathon Participants
Consider extending the plugin system for your specific use case
Start simple by wrapping your existing agents and gradually adopt more AIQ features
Focus on creating reusable components that others can build upon
Mix & Match Frameworks: Replace LlamaIndex with your custom RAG in 3 lines of YAML
Built-in Telemetry: Track token usage per agent with
aiq profile --output=report.html
Easy Debugging: Replay any conversation with
aiq replay <session_id>
Pro Tip: Try modifying the config.yml to add a new Wolfram Alpha math agent - AIQ will automatically handle routing to your new component without changing existing code.
This restructured section provides concrete implementation steps while maintaining the original architecture's clarity. It helps readers immediately grasp both the "why" and "how" through practical examples and relatable analogies.
The NVIDIA Agent Intelligence toolkit hackathon is open to all developers-whether you're a seasoned researcher, a novice developer, an ISV partner, or a cloud service provider. It's a perfect opportunity to build your skills with this powerful new toolkit.
Thanks for reading - now go build something amazing with the NVIDIA Agent Intelligence toolkit!