Description
We're sharing our roadmap leading up to the v0.4.0 release (end of June) to foster open-source development of Dynamo.
By then, our goal is to make Dynamo production-ready for GenAI inference.
This roadmap is subject to change, and community contributions are highly encouraged. If you're interested in contributing to specific features, please comment on this issue.
To accelerate progress, Dynamo releases will follow a biweekly cadence. Expect a major release monthly (incrementing by 0.1) with minor releases in between.
Key objectives from v0.2.0 until v0.4.0
- LLM support
- Performant disaggregated serving including DeepSeek R1
- Multi-LoRA
- Speculative decoding
- Full feature support with TRT-LLM, vLLM, and SGLang
- Multimodal
- Text - image - video model support
- E/P/D disaggregation
- Multimodal cache
- KV cache manager
- KV offloading to multiple levels of memory hierarchy
- Local & Network storage support with most known storage vendors
- Performant multi-turn conversations
- Planner
- Dynamic allocation of prefill and decode
- SLA requirement based real time performance tuning
- Fault tolerance
- Model execution fault tolerance
- Instance fault tolerance
- Agents
- Constrained decoding and function calling
- Performant KV offloading and pre-fetching based on predicted time of agent execution
- MCP support
- Performance benchmarking
- GPU level metrics
- Energy metrics for TCO calculation
- Validated K8 workflow for deployment
- Scale up to 64 GPUs
- Helm charts and custom operators
- AWS/Azure/GKE support and tutorials
Expected timeline
Here are the major features you can expect in our next immediate release. We will provide more details for subsequent releases iteratively to ensure transparency. Please stay tuned for further updates.
-
v 0.2.0 (End of April)
- KV Manager
- Offloading enabled for GPU, host memory, SSD, and network storage
- Planner
- Dynamically allocate prefill and decode
- Validated K8 workflow
- Helm charts and custom operators
- NIXL AWS EFA support and NIXL microbench
- KV Manager
-
v 0.2.1 (Mid May)
- vLLM v1 support and reduced performance overhead
- SGLang integration
- KV Manager
- Offloading enabled for GPU, host memory, SSD, and network storage
- Planner with K8 support
- Multimodal model support
- Functional E/P/D disaggregation with text + image model (Llava 1.5)
- NIXL Mooncake plugin integration
-
v 0.3.0 (Targeted for 6/4)
- Performant Deepseek R1 disaggregated serving with SGLang, TRT-LLM, and vLLM
- SGLang focused on Hopper performance
- TRT-LLM focused on Blackwell performance
- Fault tolerance for Dynamo components
- KV Manager integrated with Dynamo runtime and vLLM
- K8 support
- Model caching across pipelines
- Initial Gitops implementation for rolling updates and zero-downtime deployment
- Multimodal model support
- Performant E/P/D disaggregation with text + image model
- Functional E/P/D disaggregation with text + video model
- Planner
- Provide guide + sweep script to allow user to pick up a good starting configuration based on SLA
- NIXL
- Generic object storage support
- GPU initiated communication
- UCX Resiliency
- Performant Deepseek R1 disaggregated serving with SGLang, TRT-LLM, and vLLM
Describe the problem you're encountering
Sharing Dynamo roadmap for developer visibility.
Describe alternatives you've tried
No response