Skip to content

[Roadmap]: Dynamo roadmap until 0.4.0 (Mid July) #762

Open
@harryskim

Description

@harryskim

We're sharing our roadmap leading up to the v0.4.0 release (end of June) to foster open-source development of Dynamo.
By then, our goal is to make Dynamo production-ready for GenAI inference.

This roadmap is subject to change, and community contributions are highly encouraged. If you're interested in contributing to specific features, please comment on this issue.

To accelerate progress, Dynamo releases will follow a biweekly cadence. Expect a major release monthly (incrementing by 0.1) with minor releases in between.


Key objectives from v0.2.0 until v0.4.0

  • LLM support
    • Performant disaggregated serving including DeepSeek R1
    • Multi-LoRA
    • Speculative decoding
    • Full feature support with TRT-LLM, vLLM, and SGLang
  • Multimodal
    • Text - image - video model support
    • E/P/D disaggregation
    • Multimodal cache
  • KV cache manager
    • KV offloading to multiple levels of memory hierarchy
    • Local & Network storage support with most known storage vendors
    • Performant multi-turn conversations
  • Planner
    • Dynamic allocation of prefill and decode
    • SLA requirement based real time performance tuning
  • Fault tolerance
    • Model execution fault tolerance
    • Instance fault tolerance
  • Agents
    • Constrained decoding and function calling
    • Performant KV offloading and pre-fetching based on predicted time of agent execution
    • MCP support
  • Performance benchmarking
    • GPU level metrics
    • Energy metrics for TCO calculation
  • Validated K8 workflow for deployment
    • Scale up to 64 GPUs
    • Helm charts and custom operators
    • AWS/Azure/GKE support and tutorials

Expected timeline

Here are the major features you can expect in our next immediate release. We will provide more details for subsequent releases iteratively to ensure transparency. Please stay tuned for further updates.

  • v 0.2.0 (End of April)

    • KV Manager
      • Offloading enabled for GPU, host memory, SSD, and network storage
    • Planner
      • Dynamically allocate prefill and decode
    • Validated K8 workflow
      • Helm charts and custom operators
    • NIXL AWS EFA support and NIXL microbench
  • v 0.2.1 (Mid May)

    • vLLM v1 support and reduced performance overhead
    • SGLang integration
    • KV Manager
      • Offloading enabled for GPU, host memory, SSD, and network storage
    • Planner with K8 support
    • Multimodal model support
      • Functional E/P/D disaggregation with text + image model (Llava 1.5)
    • NIXL Mooncake plugin integration
  • v 0.3.0 (Targeted for 6/4)

    • Performant Deepseek R1 disaggregated serving with SGLang, TRT-LLM, and vLLM
      • SGLang focused on Hopper performance
      • TRT-LLM focused on Blackwell performance
    • Fault tolerance for Dynamo components
    • KV Manager integrated with Dynamo runtime and vLLM
    • K8 support
      • Model caching across pipelines
      • Initial Gitops implementation for rolling updates and zero-downtime deployment
    • Multimodal model support
      • Performant E/P/D disaggregation with text + image model
      • Functional E/P/D disaggregation with text + video model
    • Planner
      • Provide guide + sweep script to allow user to pick up a good starting configuration based on SLA
    • NIXL
      • Generic object storage support
      • GPU initiated communication
      • UCX Resiliency

Describe the problem you're encountering

Sharing Dynamo roadmap for developer visibility.

Describe alternatives you've tried

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapTracks features, enhancements, or milestones planned as part of the project roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions