Towards Achieving Robust Universal Neural Vocoding
A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding.
Audio samples can be found here.
Fig 1:Architecture of the vocoder.
Quick Start
Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:
pip install univoc
Example Usage
import torch
import soundfile as sf
from univoc import Vocoder
vocoder = Vocoder.from_pretrained(
"https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()
mel = ...
with torch.no_grad():
wav, sr = vocoder.generate(mel)
sf.write("path/to/save.wav", wav, sr)
Train from Scratch
- Clone the repo:
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
- Install requirements:
pip install -r requirements.txt
- Download and extract the LJ-Speech dataset:
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
- Download the train split here and extract it in the root directory of the repo.
- Extract Mel spectrograms and preprocess audio:
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
- Train the model:
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1
Pretrained Models
Pretrained weights for the 10-bit LJ-Speech model are available here.
Notable Differences from the Paper
- Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the ZeroSpeech 2019: TTS without T English dataset click here.
- Uses an embedding layer instead of one-hot encoding.
Acknowlegements