Text-to-Speech for Ukrainian

High-fidelity speech synthesis for Ukrainian using modern neural networks.
Statuses

Demo

Check out our demo on Hugging Face space or just listen to samples here.
Features
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
Installation
# Install from PyPI
pip install tts-uk
# OR, for the latest development version:
pip install git+https://github.com/egorsmkv/tts_uk
# OR, use git and local setup
git clone https://github.com/egorsmkv/tts_uk
cd tts_uk
uv sync # uv will handle the virtual environment
Read uv's installation section.
Also, you can download the repository as a ZIP archive.
Getting started
Code example:
import torchaudio
from tts_uk.inference import synthesis
sampling_rate = 44_100
mels, wave, stats = synthesis(
text="Ви можете протестувати синтез мовлення українською мовою. Просто введіть текст, який ви хочете прослухати.",
voice="tetiana",
n_takes=1,
use_latest_take=False,
token_dur_scaling=1,
f0_mean=0,
f0_std=0,
energy_mean=0,
energy_std=0,
sigma_decoder=0.8,
sigma_token_duration=0.666,
sigma_f0=1,
sigma_energy=1,
)
print(stats)
torchaudio.save("audio.wav", wave.cpu(), sampling_rate, encoding="PCM_S")
Use these Google colabs:
Or run synthesis in a terminal:
uv run example.py
If you need to synthesize articles we recommend consider wtpsplit.
Get help and support
Please feel free to connect with us using the Issues section.
License
Code has the MIT license.

Model authors
Acoustic
Vocoder

Also, follow our Speech-UK initiative on Hugging Face!
Acknowledgements