Remove ads and more with Turbo

Voicebox is an open-source desktop app that allows you to synthesize and clone any voice using the Qwen3‑TTS model. Everything generated with Voicebox is done locally on your PC, so you won't be dependent on tokens or paid subscriptions.

Voice cloning and synthesis with Qwen3‑TTS

In Voicebox, you can clone voices from just a few seconds of reference audio (with a maximum of 30 seconds) using the capabilities of Qwen3‑TTS to reproduce tone, timbre, and accent with high fidelity. To use a reference audio, you must first upload a voice file or record directly from the microphone. After that, you have to tell the app what you said, or use the built-in transcription feature to convert it to text. With just this information, the app will save the profile, and you can then use it to generate audio without having to retrain it.

Remove ads and more with Turbo

Generate audio with any voice

Once the profile has been created, you write the text you want the voice to say, and Voicebox generates the corresponding audio. This is useful, for example, for generating translations of texts into other languages while maintaining your tone of voice. It also allows you to make funny, homemade dubs, or even dub video games. Additionally, you can export each voice profile and send it to someone so they can create audio on their PC with it.

Transcription with Whisper and audio capture

Voicebox natively integrates Whisper to transcribe the input audio and align the text with the generated voice. This allows you to edit the text directly on the timeline and see how the pronunciation changes, which is very useful if you want to correct errors, adjust pauses, or edit dialogues while maintaining the same cloned voice. It also includes system audio capture, so you can record the sound coming from your PC (from a game or a call, for example) and use it as a reference to clone a voice, or as a basis for an audio scene.

Local API and voice server

Beyond its graphical interface, Voicebox is designed to be integrated into other projects: it offers a REST API and a local server that you can start with a single click to use voice generation from games, apps, or AI agents. From this API, you can send text, select voice profiles, and get the generated audio in a standard format, which allows for the automation of narrations, dialogues, or voice notifications without having to go through cloud services. The app is built with Tauri, Rust, and Python, making it relatively lightweight compared to Electron-based solutions and enabling local execution without major hardware requirements. Even so, performance will depend on your GPU/CPU and how the Qwen3-TTS model is configured on your machine, as all processing is done on your own equipment.

Generate stories with a voice

In addition to generating audio, Voicebox has a section for generating stories using voices where you can input what you want to happen in the story. This is ideal for generating content to entertain your kids, or even for telling jokes using different voices.

My experience using Voicebox

When testing Voicebox on my PC, I found it very easy to clone a voice with just a few seconds of audio, save the profile, and then generate several text clips to put together a small dialogue on the timeline. However, some features are missing, such as the ability to add more audio or make advanced voice adjustments to make it sound more realistic.

What I liked most and what could be improved

• What I liked most: that all the cloning and synthesis processing is done locally, cloud- and subscription-free.

• What I would improve: being able to customize the voices and use multiple samples so that the final generated audio sounds better. Additionally, depending on how powerful your PC is, it may take several minutes to generate the content. There are lighter models you can use to generate content, but the more complex the model, the better the result will be.

Voicebox is for you if...

✓ You want to clone voices and generate speech without relying on cloud services or paying for subscriptions.

✓ You work with podcasts, dubbing, games, or AI agents and need a timeline editor to put together voice scenes.

✓ You prefer a local, private, and open solution that you can integrate into other projects.

Download Voicebox and start using a voice synthesis and cloning studio that runs entirely on your computer, with support for Qwen3‑TTS and Whisper.

	Developer	Jamie Pine
	License	Free
	Category	General
	Rating	Not specified
	Languages	English 47 more

	Required permissions	Not applicable
	Advertising	Not specified
		See security and antivirus report
	Why is this app published on Uptodown?	(More information)

	Downloads	374
	Date	Feb 27, 2026
	File type	EXE
	Size	292.02 MB
	SHA256	0e039f9ef42a5a2ccfe5f714b303c2b1e600ed5af2c45335a56b2bf380200a4f

Voicebox

Clone voices with AI and generate audio in multiple languages

Get the latest version

Voice cloning and synthesis with Qwen3‑TTS

Generate audio with any voice

Transcription with Whisper and audio capture

Local API and voice server

Generate stories with a voice

My experience using Voicebox

What I liked most and what could be improved

Voicebox is for you if...

Information about Voicebox 0.1.13

Basic information

Security and privacy

Download info

Older versions

Rate this App

Rating

Comments

Similar to Voicebox

Open Source Apps

Discover Audio apps