OpenVoiceLabA web interface for TTS finetuning

Train and run text-to-speech models. Currently supports VibeVoice.

🎤

Automatic Data Processing

Upload audio files and get training-ready datasets. Automatic segmentation and transcription using Silero VAD and Whisper.

🔥

LoRA Finetuning

Train custom voices using LoRA adapters. Works on consumer GPUs with 16GB+ VRAM. Monitor progress with built-in TensorBoard.

🎙️

Speech Generation

Generate speech from text using the pretrained VibeVoice model or your finetuned adapters. Supports voice cloning with reference audio.

What is this?

OpenVoiceLab is a web interface for working with VibeVoice, a text-to-speech model. It handles the full workflow: prepare data, train on custom voices, and generate speech.

Instead of running Python scripts and juggling config files, you get a Gradio interface with tabs for each step. It's designed to make TTS finetuning accessible to people who aren't ML researchers.

What you can do

Prepare training data - Upload audio files (podcasts, recordings, audiobooks) and automatically convert them into datasets. Handles segmentation and transcription.

Finetune models - Train VibeVoice on custom voices using LoRA. The interface handles the training process and lets you monitor progress.

Generate speech - Convert text to audio using pretrained models or your finetuned voices. Adjust settings like CFG scale and reference voices.

Requirements

Python 3.9 or newer
For training: NVIDIA GPU with 16+ GB VRAM (RTX 3090/4090 or similar)
For inference: 8+ GB VRAM or CPU (slower)
Works on Linux, macOS, and Windows

Quick start

bash

git clone https://github.com/fakerybakery/openvoicelab.git
cd openvoicelab
./scripts/setup.sh  # or setup.bat on Windows
./scripts/run.sh    # or run.bat on Windows

Open http://localhost:7860 in your browser.

Network Access

To access from other devices on your network, add --host 0.0.0.0 or --share for a public link:

bash

./scripts/run.sh --host 0.0.0.0  # or run.bat --host 0.0.0.0 on Windows
./scripts/run.sh --share          # or run.bat --share on Windows

Note: TensorBoard will not work from another device.

See the installation guide for details.

Current status

OpenVoiceLab is in beta. The core features work, but some things are still rough:

Data chunking could be smarter (working on longer segments)
Training results can vary depending on your data quality
Documentation is being improved

Feedback and contributions welcome on GitHub or Discord.

OpenVoiceLabA web interface for TTS finetuning

Automatic Data Processing

LoRA Finetuning

Speech Generation

What is this? ​

What you can do ​

Requirements ​

Quick start ​

Current status ​

What is this?

What you can do

Requirements

Quick start

Current status