GitHub - nvidia-cosmos/cosmos-reason2: Cosmos-Reason2 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

🤗 Hugging Face | Cosmos Cookbook

NVIDIA Cosmos Reason – an open, customizable, reasoning vision language model (VLM) for physical AI and robotics - enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.

Cosmos Reason excels at navigating the long tail of diverse scenarios of the physical world with spatial-temporal understanding. Cosmos Reason is post-trained with physical common sense and embodied reasoning data with supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.

Table of Contents

News!
Model Family
Setup
Inference
Post-Training
Quantization
Troubleshooting
Additional Resources
License and Contact

News!

[February 9, 2026] We have Improved documentation and troubleshooting guidance, expanded platform support GB200 and ARM (torchcodec & inference sample fixed), enhanced quantization and training debuggability, and updated CUDA compatibility
[December 19, 2025] We have released the Cosmos-Reason2 models and code for Physical AI common sense and embodied reasoning. The 2B and 8B models are now available on Hugging Face.

Model Family

Setup

This repository only contains documentation/examples/utilities. You do not need it to run inference. See Inference example for a minimal inference example. The following setup instructions are only needed to run the examples in this repository.

Clone the repository:

git clone https://github.com/nvidia-cosmos/cosmos-reason2.git
cd cosmos-reason2

Install one of the following environments:

Virtual Environment

Install system dependencies:

sudo apt-get install curl ffmpeg git git-lfs unzip

uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Hugging Face CLI

uvx hf auth login

Install the repository:

uv sync --extra cu128
source .venv/bin/activate

CUDA variants:

CUDA Version	Arguments	Notes
CUDA 12.8	`--extra cu128`	NVIDIA Driver
CUDA 13.0	`--extra cu130`	NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0. Additionally, you must set TRITON_PTXAS_PATH to your system PTXAS:

export TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"

Docker Container

Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.

Build the container:

image_tag=$(docker build -f Dockerfile --build-arg=CUDA_VERSION=12.8.1 -q .)

CUDA variants:

CUDA Version	Arguments	Notes
CUDA 12.8	`--build-arg=CUDA_VERSION=12.8.1`	NVIDIA Driver
CUDA 13.0	`--build-arg=CUDA_VERSION=13.0.0`	NVIDIA Driver

For DGX Spark and Jetson AGX, you must use CUDA 13.0.

Run the container:

docker run -it --gpus all --ipc=host --rm -v .:/workspace -v /workspace/.venv -v /workspace/examples/cosmos_rl/.venv -v /root/.cache:/root/.cache -e HF_TOKEN="$HF_TOKEN" $image_tag

Optional arguments:

--ipc=host: Use host system's shared memory, since parallel torchrun consumes a large amount of shared memory. If not allowed by security policy, increase --shm-size (documentation).
-v /root/.cache:/root/.cache: Mount host cache to avoid re-downloading cache entries.
-e HF_TOKEN="$HF_TOKEN": Set Hugging Face token to avoid re-authenticating.

Inference

Minimum GPU Memory

Model	GPU Memory
Cosmos-Reason2-2B	24GB
Cosmos-Reason2-8B	32GB

Tested Platforms

Cosmos-Reason2 works on Hopper and Blackwell. Additional hardware configurations may work but are not officially validated at the time of this release.

Examples have been tested on the following devices:

GPU	CUDA Version	Functionality
NVIDIA H100	12.8	inference/post-training/quantization
NVIDIA GB200	13.0	inference
NVIDIA DGX Spark	13.0	inference
NVIDIA Jetson AGX Thor (Edge)	13.0	Transformers inference. vLLM inference is coming soon!

Transformers

Cosmos-Reason2 is included in transformers>=4.57.0.

Minimal example (sample output):

python scripts/inference_sample.py

Deployment

For deployment and batch inference, we recommend using vllm>=0.11.0.

Online Serving

Start the server in a separate terminal or a background process.

Tip

Docker users: Run docker exec -it <CONTAINER_ID> bash to exec into your container. Find your container ID with docker ps.

vllm serve nvidia/Cosmos-Reason2-2B \
  --allowed-local-media-path "$(pwd)" \
  --max-model-len 16384 \
  --media-io-kwargs '{"video": {"num_frames": -1}}' \
  --reasoning-parser qwen3 \
  --port 8000

Optional arguments:

--max-model-len 16384: Maximum model length to avoid OOM. Recommended range: 8192 - 16384.
--media-io-kwargs '{"video": {"num_frames": -1}}': Allow overriding FPS per sample.
--reasoning-parser qwen3: Parse reasoning trace.
--port 8000: Server port. Change if you encounter Address already in use errors.

Note

First startup takes a couple minutes for model loading and CUDA graph compilation. Subsequent starts are faster with cached graphs.

Once ready, the server will print Application startup complete..

Warning

Remember to stop the server when done! The vllm server consumes significant GPU memory while running. To stop it:

If running in foreground: Press Ctrl+C
If running in background: Find the process with ps aux | grep vllm and kill it with kill <PID>

Caption a video (sample output):

cosmos-reason2-inference online --port 8000 -i prompts/caption.yaml --reasoning --videos assets/sample.mp4 --fps 4

Embodied reasoning with verbose output (sample output):

cosmos-reason2-inference online -v --port 8000 -i prompts/embodied_reasoning.yaml --reasoning --images assets/sample.png

To list available arguments:

cosmos-reason2-inference online --help

Offline Inference

Temporally caption a video and save the input frames to outputs/temporal_localization for debugging (sample output):

cosmos-reason2-inference offline -v --max-model-len 16384 -i prompts/temporal_localization.yaml --videos assets/sample.mp4 --fps 4 -o outputs/temporal_localization

To list available arguments:

cosmos-reason2-inference offline --help

Common arguments:

--model nvidia/Cosmos-Reason2-2B: Model name or path.

Post-Training

Quantization

llmcompressor

Troubleshooting

See troubleshooting guide

Additional Resources

Troubleshooting
Example prompts
Cosmos-Reason2 is based on the Qwen3-VL architecture.
vLLM

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github		.github
assets		assets
ci		ci
configs		configs
cosmos_reason2_utils		cosmos_reason2_utils
docker		docker
docs		docs
examples		examples
prompts		prompts
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.link-check-relative.json		.link-check-relative.json
.link-check.json		.link-check.json
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config-base.yaml		.pre-commit-config-base.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
ATTRIBUTIONS.md		ATTRIBUTIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
pyrefly.toml		pyrefly.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News!

Model Family

Setup

Inference

Minimum GPU Memory

Tested Platforms

Transformers

Deployment

Online Serving

Offline Inference

Post-Training

Quantization

Troubleshooting

Additional Resources

License and Contact

About

Uh oh!

Releases 2

Packages

Contributors 6

Languages

License

nvidia-cosmos/cosmos-reason2

Folders and files

Latest commit

History

Repository files navigation

News!

Model Family

Setup

Inference

Minimum GPU Memory

Tested Platforms

Transformers

Deployment

Online Serving

Offline Inference

Post-Training

Quantization

Troubleshooting

Additional Resources

License and Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 6

Languages

Packages