SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets the generated dub track stay in sync with the original video structure. The project supports a wide range of languages for translation, spanning major world languages (English, Spanish, French, German, Chinese, Arabic, etc.) and many regional or less widely spoken languages, making it suitable for broad internationalization. It offers multiple usage modes, including a Colab notebook for cloud-based experimentation, a Hugging Face Space demo for quick trials, and instructions.
Features
- Gradio-powered web interface for end-to-end video translation and dubbing
- Automated pipeline for transcription, translation, TTS and audio re-timing to subtitles
- Broad language support across dozens of source and target languages
- Colab notebook and Hugging Face demo for quick, no-install experimentation
- GPU-accelerated local installation path using PyTorch, CUDA and FFmpeg
- Optional speaker diarization and advanced alignment using Pyannote and related models