Voicebox is an open-source desktop app that allows you to synthesize and clone any voice using the Qwen3‑TTS model. Everything generated with Voicebox is done locally on your PC, so you won't be dependent on tokens or paid subscriptions.
Voice cloning and synthesis with Qwen3‑TTS
In Voicebox, you can clone voices from just a few seconds of reference audio (with a maximum of 30 seconds) using the capabilities of Qwen3‑TTS to reproduce tone, timbre, and accent with high fidelity. To use a reference audio, you must first upload a voice file or record directly from the microphone. After that, you have to tell the app what you said, or use the built-in transcription feature to convert it to text. With just this information, the app will save the profile, and you can then use it to generate audio without having to retrain it.
Generate audio with any voice
Once the profile has been created, you write the text you want the voice to say, and Voicebox generates the corresponding audio. This is useful, for example, for generating translations of texts into other languages while maintaining your tone of voice. It also allows you to make funny, homemade dubs, or even dub video games. Additionally, you can export each voice profile and send it to someone so they can create audio on their PC with it.
Transcription with Whisper and audio capture
Voicebox natively integrates Whisper to transcribe the input audio and align the text with the generated voice. This allows you to edit the text directly on the timeline and see how the pronunciation changes, which is very useful if you want to correct errors, adjust pauses, or edit dialogues while maintaining the same cloned voice. It also includes system audio capture, so you can record the sound coming from your PC (from a game or a call, for example) and use it as a reference to clone a voice, or as a basis for an audio scene.
Local API and voice server
Beyond its graphical interface, Voicebox is designed to be integrated into other projects: it offers a REST API and a local server that you can start with a single click to use voice generation from games, apps, or AI agents. From this API, you can send text, select voice profiles, and get the generated audio in a standard format, which allows for the automation of narrations, dialogues, or voice notifications without having to go through cloud services. The app is built with Tauri, Rust, and Python, making it relatively lightweight compared to Electron-based solutions and enabling local execution without major hardware requirements. Even so, performance will depend on your GPU/CPU and how the Qwen3-TTS model is configured on your machine, as all processing is done on your own equipment.
Generate stories with a voice
In addition to generating audio, Voicebox has a section for generating stories using voices where you can input what you want to happen in the story. This is ideal for generating content to entertain your kids, or even for telling jokes using different voices.
My experience using Voicebox
When testing Voicebox on my PC, I found it very easy to clone a voice with just a few seconds of audio, save the profile, and then generate several text clips to put together a small dialogue on the timeline. However, some features are missing, such as the ability to add more audio or make advanced voice adjustments to make it sound more realistic.
What I liked most and what could be improved
• What I liked most: that all the cloning and synthesis processing is done locally, cloud- and subscription-free.
• What I would improve: being able to customize the voices and use multiple samples so that the final generated audio sounds better. Additionally, depending on how powerful your PC is, it may take several minutes to generate the content. There are lighter models you can use to generate content, but the more complex the model, the better the result will be.
Voicebox is for you if...
✓ You want to clone voices and generate speech without relying on cloud services or paying for subscriptions.
✓ You work with podcasts, dubbing, games, or AI agents and need a timeline editor to put together voice scenes.
✓ You prefer a local, private, and open solution that you can integrate into other projects.
Download Voicebox and start using a voice synthesis and cloning studio that runs entirely on your computer, with support for Qwen3‑TTS and Whisper.
Comments
There are no opinions about Voicebox yet. Be the first! Comment