Skip to content

Troubleshooting

Common issues and how to fix them.

Installation Issues

Python not found

Error: python3: command not found

Install Python 3.9+ from python.org or your package manager.

Check version:

bash
python3 --version

Setup script fails

On Linux/macOS:

bash
chmod +x scripts/setup.sh
./scripts/setup.sh

On Windows: Run scripts\setup.bat from Command Prompt, not PowerShell.

PyTorch installation fails

Install manually:

bash
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install torch torchvision torchaudio

For CUDA support:

bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Port 7860 already in use

Use a different port:

bash
python -m ovl.cli --port 8080

Model Loading Issues

Out of memory when loading model

Solutions:

  • Close other GPU applications
  • Use CPU device (slower): select "cpu" in device dropdown
  • Use smaller model (1.5B instead of 7B)
  • Restart computer to clear GPU memory

Model download timeout

Downloading for first time:

  • Model is 3-6 GB, takes 5-20 minutes depending on internet speed
  • Let it finish completely
  • If interrupted, delete partially downloaded files and retry

Find downloaded models:

bash
~/.cache/huggingface/hub/

CUDA not available

Check if GPU is detected:

bash
python -c "import torch; print(torch.cuda.is_available())"

If False:

  • Update NVIDIA drivers
  • Reinstall PyTorch with CUDA support
  • Check if GPU is being used by another process

LoRA adapter won't load

Check path:

  • Make sure path points to checkpoints/ folder
  • Example: training_runs/run_20251008_143022/checkpoints/
  • Don't include a specific file, just the folder

Verify checkpoint exists:

bash
ls training_runs/run_TIMESTAMP/checkpoints/

Should contain adapter_model.bin or similar files.

Data Processing Issues

No audio segments found after VAD

Causes:

  • Audio is too quiet
  • Audio is too noisy
  • No actual speech in audio

Solutions:

  • Use cleaner audio
  • Boost audio volume before processing
  • Try different audio files

Transcriptions are wrong

Solutions:

  • Use a larger Whisper model (medium or large)
  • Check audio quality
  • Verify audio language matches model
  • Some errors are okay - model can handle minor transcription mistakes

Processing is very slow

Normal on CPU. Whisper is slow without GPU.

Solutions:

  • Use smaller Whisper model (tiny or base)
  • Enable GPU if available
  • Process fewer files at once
  • Be patient - it's a one-time process

Out of memory during data processing

Solutions:

  • Close other applications
  • Use smaller Whisper model
  • Process files in batches (multiple datasets)

Training Issues

Out of memory during training

Solutions:

  • Reduce batch size (try 2 or 1)
  • Reduce LoRA rank (try 4)
  • Close other GPU applications
  • Use 1.5B model instead of 7B
  • Restart computer

Check VRAM usage:

bash
nvidia-smi  # on NVIDIA GPUs

Training loss not decreasing

Check:

  • Dataset quality (listen to samples)
  • Transcription accuracy (check metadata.csv)

Try:

  • More epochs (5-7 instead of 3)
  • Different learning rate (try 1.5e-4)
  • Better quality data

Training crashes

Check logs:

bash
cat training_runs/run_TIMESTAMP/train.log

Common causes:

  • Out of memory - reduce batch size
  • Corrupted dataset - verify files
  • Disk full - free up space

TensorBoard won't load

Wait 30-60 seconds after training starts, then click Refresh.

Check if running:

bash
ps aux | grep tensorboard

Manual start:

bash
tensorboard --logdir training_runs/run_TIMESTAMP/logs --port 6006

Training is very slow

Normal: Training takes 2-6 hours for typical datasets.

ETA in logs:

Step 100/1500 | 2.3s/it

Calculate: (1500-100) * 2.3 seconds = remaining time

Speed depends on:

  • Dataset size
  • Batch size
  • GPU speed
  • Model size

Generation Issues

Generated audio sounds robotic

Try:

  • Lower CFG scale (1.0-1.2)
  • Different reference voice
  • Model may be overtrained - reduce epochs next time

Audio doesn't match expected voice

Check:

  • LoRA adapter loaded correctly
  • Using appropriate reference voice
  • Try higher CFG scale (1.5-1.8)

Random background music appears

This is normal VibeVoice behavior. The model was trained on data with background sounds.

Mitigate:

  • Use different reference voice (cleaner samples)
  • Regenerate (results vary)
  • Try different text

Generation is too slow

RTF > 1.0 is normal for large models on consumer hardware.

Speed up:

  • Use GPU instead of CPU
  • Use 1.5B instead of 7B
  • Shorten text
  • Close other applications

Audio has artifacts or glitches

Try:

  • Regenerate (may be random)
  • Different reference voice
  • Different text
  • Check if system is overloaded

General Issues

Interface won't load

Check if running:

bash
ps aux | grep "python -m ovl.cli"

Restart:

bash
./scripts/run.sh  # or run.bat on Windows

Network Access

To access from other devices on your network, add --host 0.0.0.0 or --share:

bash
./scripts/run.sh --host 0.0.0.0  # or run.bat --host 0.0.0.0 on Windows
./scripts/run.sh --share          # or run.bat --share on Windows

Note: TensorBoard will not work from another device.

Check browser URL:

http://localhost:7860

Changes not appearing

Refresh browser: Ctrl+R or Cmd+R

Clear cache: Shift+Ctrl+R or Shift+Cmd+R

Gradio connection lost

Long-running operations may timeout. Check logs to see if process is still running.

Refresh page - training/processing continues in background.

Disk full

Check space:

bash
df -h  # Linux/macOS

Clean up:

bash
# Delete old training runs
rm -rf training_runs/run_OLD_TIMESTAMP

# Delete old outputs
rm outputs/generated_OLD_*.wav

Python version issues

Check version:

bash
python3 --version

Need 3.9+. If older, update Python.

Platform-Specific Issues

macOS: MPS not available

Apple Silicon Macs should detect MPS automatically.

Check:

bash
python -c "import torch; print(torch.backends.mps.is_available())"

If False: Update PyTorch:

bash
pip install --upgrade torch

Windows: Scripts won't run

Use Command Prompt, not PowerShell.

Or run Python directly:

cmd
python -m ovl.cli

Linux: Permission denied

Make scripts executable:

bash
chmod +x scripts/setup.sh scripts/run.sh

Getting More Help

Check logs:

bash
# Application log
cat logs/openvoicelab.log

# Training log
cat training_runs/run_TIMESTAMP/train.log

Error messages usually indicate the problem. Search for the error online or ask on Discord.

Community help:

Provide when asking for help:

  • OpenVoiceLab version
  • Python version (python3 --version)
  • OS and version
  • GPU model (if applicable)
  • Error message or logs
  • Steps to reproduce

Still Having Issues?

If nothing here helps:

  1. Check FAQ for common questions
  2. Search GitHub issues
  3. Ask on Discord
  4. Open a new GitHub issue with details

Remember: OpenVoiceLab is in beta. Some rough edges are expected. Your feedback helps improve it.

Released under the BSD-3-Clause License.