mindmirror

MS1: Get (english) speech input via mic and output it as text

Considering
Went with whisper because speechbrain gave me some dependency issues with pytorch

MS2: Send text to ai API and print response

Considering
- Anthropic/claude
- MistralAI/mixtral
- Google/gemini
Went with gemini as it has straight forward api and sufficient free plan
Integrated MS1 and MS2 with multiprocessing queues

MS3: TTS with off-the-shelf voice model

Considering
- https://github.com/OHF-Voice/piper1-gpl
- https://github.com/coqui-ai/TTS
Went with PiperVoice, because easy to use with lots of voice models available
Integrated MS1, MS2 and MS3

MS4: Voice cloning

Considering
Went with StyleTTS2 because of compatibility issues with TTS and OpenVoice
- Had issues as well
- Then fine tune a model
  - Copy to the following files into the StyleTTS2 repo dir /styletts2:
    - styletts2/data/wavs_clean
    - styletts2/data/config_ft.yml
    - styletts2/data/val_list.txt
    - styletts2/data/train_list.txt
  - Adjust the paths inside config_ft.yml
    - data_params: root_path: "xxx/StyleTTS2/styletts2/data/wavs_clean" train_data: "xxx/StyleTTS2/styletts2/train_list.txt" val_data: "xxx/StyleTTS2/styletts2/val_list.txt"
  - Cd into xxx/StyleTTS2
    - python train_finetune.py --config_path styletts2/config_ft.yml
    - Just no way to get this to work with my hardware
Trying with GPT-SoVITS
- Failed again
Trying with F5-TTS
1. Install F5-TTS repository: clone and then (mindmirror) repos/mindmirror$ pip install -e .
2. Record voice sample: (mindmirror) repos/mindmirror$ python record_sample.py
3. Adjust to the pretrained vocab PRETRAINED_VOCAB_PATH = files("f5_tts").joinpath("../../data/Emilia_ZH_EN_pinyin/vocab.txt") in repos/F5-TTS/src/f5_tts/train/datasets/prepare_csv_wavs.py
4. Prepare dataset: (mindmirror) repos/F5-TTS$ python src/f5_tts/train/datasets/prepare_csv_wavs.py ../mindmirror/data/MyVoice/ ./data/MyVoice_pinyin
5. Adjust num_workers=os.cpu_count() in repos/F5-TTS/src/f5_tts/train/finetune_cli.py
6. Train on top of model F5TTS v1 base
- (mindmirror) repos/F5-TTS$ python src/f5_tts/train/finetune_cli.py --exp_name F5TTS_v1_Base --dataset_name MyVoice --finetune --pretrain ckpts/F5TTS_v1_Base/model_1250000.safetensors --tokenizer pinyin --learning_rate 5e-5 --epochs 50 --batch_size_type sample --batch_size_per_gpu 1 --grad_accumulation_steps 4 --save_per_updates 5000 --keep_last_n_checkpoints 1
1. Test loading the model (mindmirror) repos/mindmirror$ python f5_tts/verify_model.py
2. Test inference
- (mindmirror) repos/F5-TTS$ python src/f5_tts/infer/infer_cli.py --model F5TTS_v1_Base --ckpt_file ckpts/MyVoice/model_last.pt --ref_audio ../mindmirror/voice_samples/wavs_clean/paragraph_01.wav --ref_text "The birch canoe slid on the smooth planks." --gen_text "This is a test. I am checking if my voice model is overtrained."
(mindmirror) repos/F5-TTS$ python integration.py

How to run

Create conda env
1. repos/mindmirror$ conda create -n mindmirror python=3.11 -y
2. repos/mindmirror$ ``conda activate mindmirror
3. (mindmirror) repos/mindmirror$ pip install -r requirements.txt
Add huggingface api key and gemini api key to .env
download pipervoice of your choice into repos/mindmirror/pipervoice/en/ (I used semaine)
Edit integration.py to use piper as tts system
Run (mindmirror) repos/mindmirror$ python integration.py

For custom voice:

Clone and install F5-TTS repository: clone and then (mindmirror) repos/mindmirror$ pip install -e .
Record voice sample: (mindmirror) repos/mindmirror$ python record_sample.py
Download the pretrained model "F5TTS_v1_Base" from huggingface
Adjust to the pretrained vocab PRETRAINED_VOCAB_PATH = files("f5_tts").joinpath("../../data/Emilia_ZH_EN_pinyin/vocab.txt") in repos/F5-TTS/src/f5_tts/train/datasets/prepare_csv_wavs.py
Prepare dataset: (mindmirror) repos/F5-TTS$ python src/f5_tts/train/datasets/prepare_csv_wavs.py ../mindmirror/data/MyVoice/ ./data/MyVoice_pinyin
Adjust num_workers=os.cpu_count() in repos/F5-TTS/src/f5_tts/train/finetune_cli.py
Train on top of model F5TTS v1 base
- (mindmirror) repos/F5-TTS$ python src/f5_tts/train/finetune_cli.py --exp_name F5TTS_v1_Base --dataset_name MyVoice --finetune --pretrain ckpts/F5TTS_v1_Base/model_1250000.safetensors --tokenizer pinyin --learning_rate 5e-5 --epochs 50 --batch_size_type sample --batch_size_per_gpu 1 --grad_accumulation_steps 4 --save_per_updates 5000 --keep_last_n_checkpoints 1
Test loading the model (mindmirror) repos/mindmirror$ python f5_tts/verify_model.py
Test inference
- (mindmirror) repos/F5-TTS$ python src/f5_tts/infer/infer_cli.py --model F5TTS_v1_Base --ckpt_file ckpts/MyVoice/model_last.pt --ref_audio ../mindmirror/voice_samples/wavs_clean/paragraph_01.wav --ref_text "The birch canoe slid on the smooth planks." --gen_text "This is a test. I am checking if my voice model is overtrained."
Run (mindmirror) repos/mindmirror$ python integration.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
f5_tts		f5_tts
gemini		gemini
pipervoice		pipervoice
whisper_stt		whisper_stt
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_ttt.py		ai_ttt.py
console.py		console.py
integration.py		integration.py
record_sample.py		record_sample.py
requirements.txt		requirements.txt
script.yaml		script.yaml
sound.py		sound.py
stt.py		stt.py
test_inference.py		test_inference.py
tts.py		tts.py
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mindmirror

MS1: Get (english) speech input via mic and output it as text

MS2: Send text to ai API and print response

MS3: TTS with off-the-shelf voice model

MS4: Voice cloning

How to run

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

gwario/mindmirror

Folders and files

Latest commit

History

Repository files navigation

mindmirror

MS1: Get (english) speech input via mic and output it as text

MS2: Send text to ai API and print response

MS3: TTS with off-the-shelf voice model

MS4: Voice cloning

How to run

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages