Skip to content

OpenVoiceLabA web interface for TTS finetuning

Train and run text-to-speech models. Currently supports VibeVoice.

What is this?

OpenVoiceLab is a web interface for working with VibeVoice, a text-to-speech model. It handles the full workflow: prepare data, train on custom voices, and generate speech.

Instead of running Python scripts and juggling config files, you get a Gradio interface with tabs for each step. It's designed to make TTS finetuning accessible to people who aren't ML researchers.

What you can do

Prepare training data - Upload audio files (podcasts, recordings, audiobooks) and automatically convert them into datasets. Handles segmentation and transcription.

Finetune models - Train VibeVoice on custom voices using LoRA. The interface handles the training process and lets you monitor progress.

Generate speech - Convert text to audio using pretrained models or your finetuned voices. Adjust settings like CFG scale and reference voices.

Requirements

  • Python 3.9 or newer
  • For training: NVIDIA GPU with 16+ GB VRAM (RTX 3090/4090 or similar)
  • For inference: 8+ GB VRAM or CPU (slower)
  • Works on Linux, macOS, and Windows

Quick start

bash
git clone https://github.com/fakerybakery/openvoicelab.git
cd openvoicelab
./scripts/setup.sh  # or setup.bat on Windows
./scripts/run.sh    # or run.bat on Windows

Open http://localhost:7860 in your browser.

Network Access

To access from other devices on your network, add --host 0.0.0.0 or --share for a public link:

bash
./scripts/run.sh --host 0.0.0.0  # or run.bat --host 0.0.0.0 on Windows
./scripts/run.sh --share          # or run.bat --share on Windows

Note: TensorBoard will not work from another device.

See the installation guide for details.

Current status

OpenVoiceLab is in beta. The core features work, but some things are still rough:

  • Data chunking could be smarter (working on longer segments)
  • Training results can vary depending on your data quality
  • Documentation is being improved

Feedback and contributions welcome on GitHub or Discord.

Released under the BSD-3-Clause License.