
Security News
NIST Under Federal Audit for NVD Processing Backlog and Delays
As vulnerability data bottlenecks grow, the federal government is formally investigating NIST’s handling of the National Vulnerability Database.
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.
Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.
Zonos follows a straightforward architecture: text normalization and phonemization via eSpeak, followed by DAC token prediction through a transformer or hybrid backbone. An overview of the architecture can be seen below.
import torch
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict
# model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-hybrid", device="cuda")
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
speaker = model.make_speaker_embedding(wav, sampling_rate)
cond_dict = make_cond_dict(text="Hello, world!", speaker=speaker, language="en-us")
conditioning = model.prepare_conditioning(cond_dict)
codes = model.generate(conditioning)
wavs = model.autoencoder.decode(codes).cpu()
torchaudio.save("sample.wav", wavs[0], model.autoencoder.sampling_rate)
uv run gradio_interface.py
# python gradio_interface.py
This should produce a sample.wav
file in your project root directory.
For repeated sampling we highly recommend using the gradio interface instead, as the minimal example needs to load the model every time it is run.
At the moment this repository only supports Linux systems (preferably Ubuntu 22.04/24.04) with recent NVIDIA GPUs (3000-series or newer, 6GB+ VRAM).
See also Docker Installation
Zonos depends on the eSpeak library phonemization. You can install it on Ubuntu with the following command:
apt install -y espeak-ng
We highly recommend using a recent version of uv for installation. If you don't have uv installed, you can install it via pip: pip install -U uv
.
uv sync
uv sync --extra compile
uv pip install -e .
uv pip install -e .[compile]
pip install -e .
pip install --no-build-isolation -e .[compile]
For convenience we provide a minimal example to check that the installation works:
uv run sample.py
# python sample.py
git clone https://github.com/Zyphra/Zonos.git
cd Zonos
# For gradio
docker compose up
# Or for development you can do
docker build -t zonos .
docker run -it --gpus=all --net=host -v /path/to/Zonos:/Zonos -t zonos
cd /Zonos
python sample.py # this will generate a sample.wav in /Zonos
FAQs
Text-to-speech
We found that zonos demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
As vulnerability data bottlenecks grow, the federal government is formally investigating NIST’s handling of the National Vulnerability Database.
Research
Security News
Socket’s Threat Research Team has uncovered 60 npm packages using post-install scripts to silently exfiltrate hostnames, IP addresses, DNS servers, and user directories to a Discord-controlled endpoint.
Security News
TypeScript Native Previews offers a 10x faster Go-based compiler, now available on npm for public testing with early editor and language support.