4IK1d

Follow

4IK1d

Follow

11 followers · 18 following

Starred repositories

joonson / voxconverse

Spot the conversation: speaker diarisation in the wild

168 17 Updated Jul 26, 2022

fakerybakery / utmos

A toolkit to calculate speech audio quality. Not affiliated with the original authors

Python 73 6 Updated Aug 13, 2024

wenet-e2e / WenetSpeech

A 10000+ hours dataset for Chinese speech recognition

Shell 617 56 Updated Jan 9, 2026

ntegrals / openbrowser

Let AI agents browse the web. An autonomous toolkit for browser-based AI agents.

TypeScript 9,479 866 Updated Apr 2, 2026

OpenBrowserAI / openbrowser

OpenBrowser is an open-source, AI-native browser built on Chromium — a truly privacy-first alternative to ChatGPT Atlas, Perplexity Comet, and Dia.

TypeScript 56 14 Updated Feb 24, 2026

lightpanda-io / browser

Lightpanda: the headless browser designed for AI and automation

Zig 31,482 1,393 Updated Jun 29, 2026

Blaizzy / mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Python 7,449 651 Updated Jun 28, 2026

OpenMOSS / MOSS-TTS-Nano

MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…

Python 3,790 481 Updated Jun 2, 2026

xzf-thu / Voices-in-the-Wild-Bench

Python 26 Updated May 22, 2026

QwenLM / Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,991 303 Updated Jun 26, 2026

OpenMOSS / MOSS-Audio

MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.

Python 585 41 Updated Jun 2, 2026

OpenMOSS / MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 138 7 Updated Feb 13, 2026

opendatalab / OmniDocBench

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 1,854 182 Updated Jun 26, 2026

huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 25,346 4,929 Updated Jun 28, 2026

unitreerobotics / unifolm-world-model-action

Python 1,051 132 Updated Mar 18, 2026

boundless-large-model / boundless-world-model

High-fidelity world models for general embodied intelligence, such as data engines and world simulators.

Python 1,856 76 Updated Jun 24, 2026

knightnemo / Awesome-World-Models

A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.

3,090 127 Updated Jun 28, 2026

ginwind / VLA-JEPA

[ECCV 2026] VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Python 430 32 Updated May 2, 2026

yuantianyuan01 / FastWAM

Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?

Python 1,044 113 Updated Apr 3, 2026

alibaba-damo-academy / RynnVLA-002

RynnVLA-002: A Unified Vision-Language-Action and World Model

Python 1,080 64 Updated Dec 2, 2025

thu-ml / Motus

Official code of Motus: A Unified Latent Action World Model

Python 1,172 65 Updated Jan 5, 2026

open-gigaai / giga-world-policy

GigaWorld-Policy: An Efficient Action-Centered World–Action Model

Python 1,294 101 Updated Apr 20, 2026

open-gigaai / giga-brain-0

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Python 2,546 200 Updated Mar 10, 2026

dreamzero0 / dreamzero

Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals

Python 2,343 199 Updated Apr 19, 2026

Robbyant / lingbot-va

[RSS 2026] Causal video-action world model for generalist robot control

Python 1,392 124 Updated Apr 29, 2026

DravenALG / awesome-vla-wam

A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond

795 27 Updated Jun 21, 2026

Robbyant / lingbot-vla

A Pragmatic VLA Foundation Model

Python 1,522 159 Updated Jun 11, 2026

TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 998 1,137 Updated Jul 4, 2024

PyO3 / pyo3

Rust bindings for the Python interpreter

Rust 15,860 981 Updated Jun 26, 2026

3xp10it / stockbook

豆瓣经典证券书籍收录并排名

Python 189 44 Updated Mar 25, 2021

Starred topics

Vibe coding

flowchart-generator