- All languages
- Assembly
- C
- C#
- C++
- C3
- CMake
- CSS
- Clojure
- Common Lisp
- Crystal
- Cuda
- D
- Dart
- Elixir
- GDScript
- Game Maker Language
- Go
- HTML
- Handlebars
- Inform 7
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- Nim
- Objective-C
- Odin
- PHP
- PLpgSQL
- Pascal
- PowerShell
- Python
- Ren'Py
- Roff
- Ruby
- Rust
- SAS
- Scala
- Shell
- Svelte
- Swift
- TSQL
- TeX
- TypeScript
- V
- Vala
- Vue
- XSLT
- Zig
- reStructuredText
Starred repositories
Spot the conversation: speaker diarisation in the wild
A toolkit to calculate speech audio quality. Not affiliated with the original authors
A 10000+ hours dataset for Chinese speech recognition
Let AI agents browse the web. An autonomous toolkit for browser-based AI agents.
OpenBrowser is an open-source, AI-native browser built on Chromium — a truly privacy-first alternative to ChatGPT Atlas, Perplexity Comet, and Dia.
Lightpanda: the headless browser designed for AI and automation
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
MOSS-Speech is a true speech-to-speech large language model without text guidance.
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
High-fidelity world models for general embodied intelligence, such as data engines and world simulators.
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
[ECCV 2026] VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
RynnVLA-002: A Unified Vision-Language-Action and World Model
Official code of Motus: A Unified Latent Action World Model
GigaWorld-Policy: An Efficient Action-Centered World–Action Model
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
[RSS 2026] Causal video-action world model for generalist robot control
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
A flexible and efficient codebase for training visually-conditioned language models (VLMs)