Skip to content

Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

License

Notifications You must be signed in to change notification settings

nvidia-cosmos/cosmos-reason2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA Cosmos

🤗 Hugging Face  | Cosmos Cookbook

NVIDIA Cosmos Reason – an open, customizable, reasoning vision language model (VLM) for physical AI and robotics - enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.

Cosmos Reason excels at navigating the long tail of diverse scenarios of the physical world with spatial-temporal understanding. Cosmos Reason is post-trained with physical common sense and embodied reasoning data with supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.


Table of Contents


News!

  • [February 9, 2026] We have Improved documentation and troubleshooting guidance, expanded platform support GB200 and ARM (torchcodec & inference sample fixed), enhanced quantization and training debuggability, and updated CUDA compatibility
  • [December 19, 2025] We have released the Cosmos-Reason2 models and code for Physical AI common sense and embodied reasoning. The 2B and 8B models are now available on Hugging Face.

Model Family

Setup

This repository only contains documentation/examples/utilities. You do not need it to run inference. See Inference example for a minimal inference example. The following setup instructions are only needed to run the examples in this repository.

Clone the repository:

git clone https://github.com/nvidia-cosmos/cosmos-reason2.git
cd cosmos-reason2

Install one of the following environments:

Virtual Environment

Install system dependencies:

sudo apt-get install curl ffmpeg git git-lfs unzip
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uvx hf auth login

Install the repository:

uv sync --extra cu128
source .venv/bin/activate

CUDA variants:

CUDA Version Arguments Notes
CUDA 12.8 --extra cu128 NVIDIA Driver
CUDA 13.0 --extra cu130 NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0. Additionally, you must set TRITON_PTXAS_PATH to your system PTXAS:

export TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"
Docker Container

Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.

Build the container:

image_tag=$(docker build -f Dockerfile --build-arg=CUDA_VERSION=12.8.1 -q .)

CUDA variants:

CUDA Version Arguments Notes
CUDA 12.8 --build-arg=CUDA_VERSION=12.8.1 NVIDIA Driver
CUDA 13.0 --build-arg=CUDA_VERSION=13.0.0 NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0.

Run the container:

docker run -it --gpus all --ipc=host --rm -v .:/workspace -v /workspace/.venv -v /workspace/examples/cosmos_rl/.venv -v /root/.cache:/root/.cache -e HF_TOKEN="$HF_TOKEN" $image_tag

Optional arguments:

  • --ipc=host: Use host system's shared memory, since parallel torchrun consumes a large amount of shared memory. If not allowed by security policy, increase --shm-size (documentation).
  • -v /root/.cache:/root/.cache: Mount host cache to avoid re-downloading cache entries.
  • -e HF_TOKEN="$HF_TOKEN": Set Hugging Face token to avoid re-authenticating.

Inference

Minimum GPU Memory

Model GPU Memory
Cosmos-Reason2-2B 24GB
Cosmos-Reason2-8B 32GB

Tested Platforms

Cosmos-Reason2 works on Hopper and Blackwell. Additional hardware configurations may work but are not officially validated at the time of this release.

Examples have been tested on the following devices:

GPU CUDA Version Functionality
NVIDIA H100 12.8 inference/post-training/quantization
NVIDIA GB200 13.0 inference
NVIDIA DGX Spark 13.0 inference
NVIDIA Jetson AGX Thor (Edge) 13.0 Transformers inference. vLLM inference is coming soon!

Transformers

Cosmos-Reason2 is included in transformers>=4.57.0.

Minimal example (sample output):

python scripts/inference_sample.py

Deployment

For deployment and batch inference, we recommend using vllm>=0.11.0.

Online Serving

Start the server in a separate terminal or a background process.

Tip

Docker users: Run docker exec -it <CONTAINER_ID> bash to exec into your container. Find your container ID with docker ps.

vllm serve nvidia/Cosmos-Reason2-2B \
  --allowed-local-media-path "$(pwd)" \
  --max-model-len 16384 \
  --media-io-kwargs '{"video": {"num_frames": -1}}' \
  --reasoning-parser qwen3 \
  --port 8000

Optional arguments:

  • --max-model-len 16384: Maximum model length to avoid OOM. Recommended range: 8192 - 16384.
  • --media-io-kwargs '{"video": {"num_frames": -1}}': Allow overriding FPS per sample.
  • --reasoning-parser qwen3: Parse reasoning trace.
  • --port 8000: Server port. Change if you encounter Address already in use errors.

Note

First startup takes a couple minutes for model loading and CUDA graph compilation. Subsequent starts are faster with cached graphs.

Once ready, the server will print Application startup complete..

Warning

Remember to stop the server when done! The vllm server consumes significant GPU memory while running. To stop it:

  • If running in foreground: Press Ctrl+C
  • If running in background: Find the process with ps aux | grep vllm and kill it with kill <PID>

Caption a video (sample output):

cosmos-reason2-inference online --port 8000 -i prompts/caption.yaml --reasoning --videos assets/sample.mp4 --fps 4

Embodied reasoning with verbose output (sample output):

cosmos-reason2-inference online -v --port 8000 -i prompts/embodied_reasoning.yaml --reasoning --images assets/sample.png

To list available arguments:

cosmos-reason2-inference online --help

Offline Inference

Temporally caption a video and save the input frames to outputs/temporal_localization for debugging (sample output):

cosmos-reason2-inference offline -v --max-model-len 16384 -i prompts/temporal_localization.yaml --videos assets/sample.mp4 --fps 4 -o outputs/temporal_localization

To list available arguments:

cosmos-reason2-inference offline --help

Common arguments:

  • --model nvidia/Cosmos-Reason2-2B: Model name or path.

Post-Training

Quantization

Troubleshooting

See troubleshooting guide

Additional Resources

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].

About

Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published