Automatic Data Processing
Upload audio files and get training-ready datasets. Automatic segmentation and transcription using Silero VAD and Whisper.
Train and run text-to-speech models. Currently supports VibeVoice.
OpenVoiceLab is a web interface for working with VibeVoice, a text-to-speech model. It handles the full workflow: prepare data, train on custom voices, and generate speech.
Instead of running Python scripts and juggling config files, you get a Gradio interface with tabs for each step. It's designed to make TTS finetuning accessible to people who aren't ML researchers.
Prepare training data - Upload audio files (podcasts, recordings, audiobooks) and automatically convert them into datasets. Handles segmentation and transcription.
Finetune models - Train VibeVoice on custom voices using LoRA. The interface handles the training process and lets you monitor progress.
Generate speech - Convert text to audio using pretrained models or your finetuned voices. Adjust settings like CFG scale and reference voices.
git clone https://github.com/fakerybakery/openvoicelab.git
cd openvoicelab
./scripts/setup.sh # or setup.bat on Windows
./scripts/run.sh # or run.bat on WindowsOpen http://localhost:7860 in your browser.
Network Access
To access from other devices on your network, add --host 0.0.0.0 or --share for a public link:
./scripts/run.sh --host 0.0.0.0 # or run.bat --host 0.0.0.0 on Windows
./scripts/run.sh --share # or run.bat --share on WindowsNote: TensorBoard will not work from another device.
See the installation guide for details.
OpenVoiceLab is in beta. The core features work, but some things are still rough: