- Considering
- https://github.com/myshell-ai/OpenVoice
- https://github.com/coqui-ai/TTS
- https://github.com/erew123/alltalk_tts/tree/alltalkbeta
- https://github.com/SWivid/F5-TTS
- https://github.com/speechbrain/speechbrain
- https://github.com/openai/whisper
- Went with whisper because speechbrain gave me some dependency issues with pytorch
- Considering
- Anthropic/claude
- MistralAI/mixtral
- Google/gemini
- Went with gemini as it has straight forward api and sufficient free plan
- Integrated MS1 and MS2 with multiprocessing queues
- Considering
- Went with PiperVoice, because easy to use with lots of voice models available
- Integrated MS1, MS2 and MS3
- Considering
- https://github.com/coqui-ai/TTS
- https://github.com/myshell-ai/OpenVoice
- https://github.com/RVC-Boss/GPT-SoVITS
- https://github.com/yl4579/StyleTTS2
- Went with StyleTTS2 because of compatibility issues with TTS and OpenVoice
- Had issues as well
- Then fine tune a model
- Copy to the following files into the StyleTTS2 repo dir
/styletts2:styletts2/data/wavs_cleanstyletts2/data/config_ft.ymlstyletts2/data/val_list.txtstyletts2/data/train_list.txt
- Adjust the paths inside
config_ft.yml-
data_params: root_path: "xxx/StyleTTS2/styletts2/data/wavs_clean" train_data: "xxx/StyleTTS2/styletts2/train_list.txt" val_data: "xxx/StyleTTS2/styletts2/val_list.txt"
-
- Cd into xxx/StyleTTS2
python train_finetune.py --config_path styletts2/config_ft.yml- Just no way to get this to work with my hardware
- Copy to the following files into the StyleTTS2 repo dir
- Trying with GPT-SoVITS
- Failed again
- Trying with F5-TTS
- Install F5-TTS repository: clone and then
(mindmirror) repos/mindmirror$pip install -e . - Record voice sample:
(mindmirror) repos/mindmirror$python record_sample.py - Adjust to the pretrained vocab
PRETRAINED_VOCAB_PATH = files("f5_tts").joinpath("../../data/Emilia_ZH_EN_pinyin/vocab.txt")inrepos/F5-TTS/src/f5_tts/train/datasets/prepare_csv_wavs.py - Prepare dataset:
(mindmirror) repos/F5-TTS$python src/f5_tts/train/datasets/prepare_csv_wavs.py ../mindmirror/data/MyVoice/ ./data/MyVoice_pinyin - Adjust
num_workers=os.cpu_count()inrepos/F5-TTS/src/f5_tts/train/finetune_cli.py - Train on top of model F5TTS v1 base
(mindmirror) repos/F5-TTS$python src/f5_tts/train/finetune_cli.py --exp_name F5TTS_v1_Base --dataset_name MyVoice --finetune --pretrain ckpts/F5TTS_v1_Base/model_1250000.safetensors --tokenizer pinyin --learning_rate 5e-5 --epochs 50 --batch_size_type sample --batch_size_per_gpu 1 --grad_accumulation_steps 4 --save_per_updates 5000 --keep_last_n_checkpoints 1
- Test loading the model
(mindmirror) repos/mindmirror$python f5_tts/verify_model.py - Test inference
(mindmirror) repos/F5-TTS$python src/f5_tts/infer/infer_cli.py --model F5TTS_v1_Base --ckpt_file ckpts/MyVoice/model_last.pt --ref_audio ../mindmirror/voice_samples/wavs_clean/paragraph_01.wav --ref_text "The birch canoe slid on the smooth planks." --gen_text "This is a test. I am checking if my voice model is overtrained."
- Install F5-TTS repository: clone and then
(mindmirror) repos/F5-TTS$python integration.py
- Create conda env
repos/mindmirror$conda create -n mindmirror python=3.11 -yrepos/mindmirror$ ``conda activate mindmirror(mindmirror) repos/mindmirror$pip install -r requirements.txt
- Add huggingface api key and gemini api key to .env
- download pipervoice of your choice into
repos/mindmirror/pipervoice/en/(I used semaine) - Edit integration.py to use piper as tts system
- Run
(mindmirror) repos/mindmirror$python integration.py
For custom voice:
- Clone and install F5-TTS repository: clone and then
(mindmirror) repos/mindmirror$pip install -e . - Record voice sample:
(mindmirror) repos/mindmirror$python record_sample.py - Download the pretrained model "F5TTS_v1_Base" from huggingface
- Adjust to the pretrained vocab
PRETRAINED_VOCAB_PATH = files("f5_tts").joinpath("../../data/Emilia_ZH_EN_pinyin/vocab.txt")inrepos/F5-TTS/src/f5_tts/train/datasets/prepare_csv_wavs.py - Prepare dataset:
(mindmirror) repos/F5-TTS$python src/f5_tts/train/datasets/prepare_csv_wavs.py ../mindmirror/data/MyVoice/ ./data/MyVoice_pinyin - Adjust
num_workers=os.cpu_count()inrepos/F5-TTS/src/f5_tts/train/finetune_cli.py - Train on top of model F5TTS v1 base
(mindmirror) repos/F5-TTS$python src/f5_tts/train/finetune_cli.py --exp_name F5TTS_v1_Base --dataset_name MyVoice --finetune --pretrain ckpts/F5TTS_v1_Base/model_1250000.safetensors --tokenizer pinyin --learning_rate 5e-5 --epochs 50 --batch_size_type sample --batch_size_per_gpu 1 --grad_accumulation_steps 4 --save_per_updates 5000 --keep_last_n_checkpoints 1
- Test loading the model
(mindmirror) repos/mindmirror$python f5_tts/verify_model.py - Test inference
(mindmirror) repos/F5-TTS$python src/f5_tts/infer/infer_cli.py --model F5TTS_v1_Base --ckpt_file ckpts/MyVoice/model_last.pt --ref_audio ../mindmirror/voice_samples/wavs_clean/paragraph_01.wav --ref_text "The birch canoe slid on the smooth planks." --gen_text "This is a test. I am checking if my voice model is overtrained."
- Run
(mindmirror) repos/mindmirror$python integration.py