The Language of Motion

This repository contains the official implementation of "The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion".

🔍 Overview

Language of Motion (LoM) is a framework that models human motion generation as a sequence modeling problem using language models. It decomposes the human body into separate regions (face, hands, upper, and lower body) to effectively capture and generate natural human movements from various modalities such as text and audio.

✅ TODO List

Initial code release
Inference code for text-to-motion
Inference code for co-speech gesture generation
Tokenizer training code
AMASS and LibriSpeech preprocessing code
Evaluation Benchmark results
Text-to-motion Result on rotation format
Language model training code

🛠️ Environment Setup

We use Conda for environment management. Follow these steps to set up the development environment:

# Create and activate the conda environment
conda create --name lom -y python=3.10
conda activate lom

# Install PyTorch with CUDA support
conda install pytorch==2.4.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Alternative for RTX 5090 users: install pytorch by following way
# pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

# Install pip and dependencies
python -m pip install pip==21.3
pip install -r requirements.txt

# Install additional packages
pip install turbot5 -U
# Alternative for RTX 5090 users: upgrade triton to support the new architecture
# pip install --upgrade "git+https://github.com/openai/triton.git@main#egg=triton&subdirectory=python"
# export TRITON_JIT_CUDA_ARCHITECTURES=$(
#   python - <<'EOF'
# import torch
# p = torch.cuda.get_device_properties(0)
# print(f"{p.major}{p.minor}")
# EOF
# )

# Install NLP tools
python -m spacy download en_core_web_sm

# Set up fairseq (required for some components)
mkdir -p third_party
cd third_party
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
cd ../..

Setting Up Blender for Rendering

We use TEMOS for rendering. Install it with our provided script:

# Execute the setup script to install Blender and its dependencies
chmod +x setup_blender.sh
./setup_blender.sh

This script will:

Download and extract Blender 2.93.18
Verify the Blender Python path
Install all necessary Python packages for rendering

📥 Required Resources

Please register an account on the Max Planck Institute for Intelligent Systems (MPI-IS) website to access the necessary SMPLX models. Then download the SMPLX models, Hubert, T5, and T2M metrics computation checkpoints by running the following script:

chmod +x build_resources.sh
./build_resources.sh

After running the script, you will have the following directory structure:

model_files/
├── hubert_models/     # Hubert audio tokenizer models
├── smplx_models/      # SMPLX body models
├── t2m_evaluators/    # Text-to-Motion evaluation metrics
└── t5_models/         # T5 language models

📦 Pretrained Models

Pretrained models are gradually uploading! Visit the Hugging Face repository to download them.

🚀 Quick Start

Text-to-Motion Generation

python demo.py --cfg configs/demo_text2motion.yaml --text examples/text2motion.txt --task text2motion --render

Co-speech Gesture Generation

python demo.py --cfg configs/demo_cospeech.yaml --audio examples/2_scott_0_111_111.wav --task cospeech --render

🗃️ Data Preparation

To train the model, you will need to download the following datasets:

AMASS: Human motion dataset from AMASS website, with text annotation from HumanML3D.
BEAT2: Co-speech gesture dataset containing synchronized speech, emotion label, and motion data, available from the BEAT website.
LibriSpeech: Large-scale (1000+ hours) corpus of read English speech, downloadable from the LibriSpeech website.

After downloading, organize the datasets according to the following structure (detailed preprocessing instructions will be provided soon):

datasets/
├── AMASS/
├── BEAT2/
    ├── beat_chinese_v2.0.0/
    ├── beat_english_v2.0.0/
    ├── beat_japanese_v2.0.0/
    ├── beat_spanish_v2.0.0/
└── LibriSpeech/

🔄 Training Pipeline

Our comprehensive training documentation is coming soon! We'll provide detailed instructions for all three stages:

Compositional Motion Tokenization (VQ-VAE Training)
Language Model Pretraining
Task-Specific Fine-tuning

Stay tuned for updates on our training procedures and best practices.

Evaluation

Evaluation metrics and benchmarking result are currently being prepared. Soon, we'll provide:

Standardized evaluation scripts for all supported tasks
Benchmark results on public datasets
Comparison with SOTA methods

Check back for updates or follow our GitHub repository for notifications.

📝 Citation

If you find our work useful for your research, please consider citing:

@article{chen2024language,
  title={The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion},
  author={Chen, Changan and Zhang, Juze and Lakshmikanth, Shrinidhi K and Fang, Yusu and Shao, Ruizhi and Wetzstein, Gordon and Fei-Fei, Li and Adeli, Ehsan},
  journal={CVPR},
  year={2025}
}

Acknowledgements

This project was partially funded by NIH grant R01AG089169 and UST. The authors would also like to thank Georgios Pavlakos for his valuable discussion, Chaitanya Patel, Jingyan Zhang, and Bin Li for their feedback on the paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Language of Motion

🔍 Overview

✅ TODO List

🛠️ Environment Setup

Setting Up Blender for Rendering

📥 Required Resources

📦 Pretrained Models

🚀 Quick Start

Text-to-Motion Generation

Co-speech Gesture Generation

🗃️ Data Preparation

🔄 Training Pipeline

Evaluation

📝 Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
assets		assets
configs		configs
examples		examples
lom		lom
preprocess		preprocess
scripts		scripts
.gitignore		.gitignore
README.md		README.md
build_resources.sh		build_resources.sh
demo.py		demo.py
render.py		render.py
requirements.txt		requirements.txt
setup_blender.sh		setup_blender.sh

Juzezhang/language_of_motion

Folders and files

Latest commit

History

Repository files navigation

The Language of Motion

🔍 Overview

✅ TODO List

🛠️ Environment Setup

Setting Up Blender for Rendering

📥 Required Resources

📦 Pretrained Models

🚀 Quick Start

Text-to-Motion Generation

Co-speech Gesture Generation

🗃️ Data Preparation

🔄 Training Pipeline

Evaluation

📝 Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages