🤗 Hugging Face | Cosmos Cookbook
NVIDIA Cosmos Reason – an open, customizable, reasoning vision language model (VLM) for physical AI and robotics - enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.
Cosmos Reason excels at navigating the long tail of diverse scenarios of the physical world with spatial-temporal understanding. Cosmos Reason is post-trained with physical common sense and embodied reasoning data with supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.
Table of Contents
- News!
- Model Family
- Setup
- Inference
- Post-Training
- Quantization
- Troubleshooting
- Additional Resources
- License and Contact
- [February 9, 2026] We have Improved documentation and troubleshooting guidance, expanded platform support GB200 and ARM (torchcodec & inference sample fixed), enhanced quantization and training debuggability, and updated CUDA compatibility
- [December 19, 2025] We have released the Cosmos-Reason2 models and code for Physical AI common sense and embodied reasoning. The 2B and 8B models are now available on Hugging Face.
This repository only contains documentation/examples/utilities. You do not need it to run inference. See Inference example for a minimal inference example. The following setup instructions are only needed to run the examples in this repository.
Clone the repository:
git clone https://github.com/nvidia-cosmos/cosmos-reason2.git
cd cosmos-reason2Install one of the following environments:
Virtual Environment
Install system dependencies:
sudo apt-get install curl ffmpeg git git-lfs unzipcurl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/envuvx hf auth loginInstall the repository:
uv sync --extra cu128
source .venv/bin/activateCUDA variants:
| CUDA Version | Arguments | Notes |
|---|---|---|
| CUDA 12.8 | --extra cu128 |
NVIDIA Driver |
| CUDA 13.0 | --extra cu130 |
NVIDIA Driver |
For DGX Spark and Jetson AGX, you must use CUDA 13.0. Additionally, you must set TRITON_PTXAS_PATH to your system PTXAS:
export TRITON_PTXAS_PATH="/usr/local/cuda/bin/ptxas"Docker Container
Please make sure you have access to Docker on your machine and the NVIDIA Container Toolkit is installed.
Build the container:
image_tag=$(docker build -f Dockerfile --build-arg=CUDA_VERSION=12.8.1 -q .)CUDA variants:
| CUDA Version | Arguments | Notes |
|---|---|---|
| CUDA 12.8 | --build-arg=CUDA_VERSION=12.8.1 |
NVIDIA Driver |
| CUDA 13.0 | --build-arg=CUDA_VERSION=13.0.0 |
NVIDIA Driver |
For DGX Spark and Jetson AGX, you must use CUDA 13.0.
Run the container:
docker run -it --gpus all --ipc=host --rm -v .:/workspace -v /workspace/.venv -v /workspace/examples/cosmos_rl/.venv -v /root/.cache:/root/.cache -e HF_TOKEN="$HF_TOKEN" $image_tagOptional arguments:
--ipc=host: Use host system's shared memory, since parallel torchrun consumes a large amount of shared memory. If not allowed by security policy, increase--shm-size(documentation).-v /root/.cache:/root/.cache: Mount host cache to avoid re-downloading cache entries.-e HF_TOKEN="$HF_TOKEN": Set Hugging Face token to avoid re-authenticating.
| Model | GPU Memory |
|---|---|
| Cosmos-Reason2-2B | 24GB |
| Cosmos-Reason2-8B | 32GB |
Cosmos-Reason2 works on Hopper and Blackwell. Additional hardware configurations may work but are not officially validated at the time of this release.
Examples have been tested on the following devices:
| GPU | CUDA Version | Functionality |
|---|---|---|
| NVIDIA H100 | 12.8 | inference/post-training/quantization |
| NVIDIA GB200 | 13.0 | inference |
| NVIDIA DGX Spark | 13.0 | inference |
| NVIDIA Jetson AGX Thor (Edge) | 13.0 | Transformers inference. vLLM inference is coming soon! |
Cosmos-Reason2 is included in transformers>=4.57.0.
Minimal example (sample output):
python scripts/inference_sample.pyFor deployment and batch inference, we recommend using vllm>=0.11.0.
Start the server in a separate terminal or a background process.
Tip
Docker users: Run docker exec -it <CONTAINER_ID> bash to exec into your container. Find your container ID with docker ps.
vllm serve nvidia/Cosmos-Reason2-2B \
--allowed-local-media-path "$(pwd)" \
--max-model-len 16384 \
--media-io-kwargs '{"video": {"num_frames": -1}}' \
--reasoning-parser qwen3 \
--port 8000Optional arguments:
--max-model-len 16384: Maximum model length to avoid OOM. Recommended range: 8192 - 16384.--media-io-kwargs '{"video": {"num_frames": -1}}': Allow overriding FPS per sample.--reasoning-parser qwen3: Parse reasoning trace.--port 8000: Server port. Change if you encounterAddress already in useerrors.
Note
First startup takes a couple minutes for model loading and CUDA graph compilation. Subsequent starts are faster with cached graphs.
Once ready, the server will print Application startup complete..
Warning
Remember to stop the server when done! The vllm server consumes significant GPU memory while running. To stop it:
- If running in foreground: Press
Ctrl+C - If running in background: Find the process with
ps aux | grep vllmand kill it withkill <PID>
Caption a video (sample output):
cosmos-reason2-inference online --port 8000 -i prompts/caption.yaml --reasoning --videos assets/sample.mp4 --fps 4Embodied reasoning with verbose output (sample output):
cosmos-reason2-inference online -v --port 8000 -i prompts/embodied_reasoning.yaml --reasoning --images assets/sample.pngTo list available arguments:
cosmos-reason2-inference online --helpTemporally caption a video and save the input frames to outputs/temporal_localization for debugging (sample output):
cosmos-reason2-inference offline -v --max-model-len 16384 -i prompts/temporal_localization.yaml --videos assets/sample.mp4 --fps 4 -o outputs/temporal_localizationTo list available arguments:
cosmos-reason2-inference offline --helpCommon arguments:
--model nvidia/Cosmos-Reason2-2B: Model name or path.
- Troubleshooting
- Example prompts
- Cosmos-Reason2 is based on the Qwen3-VL architecture.
- vLLM
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].
