
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Implementation of E2-TTS, Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS, with the MLX framework.
This implementation is based on the lucidrains implementation in Pytorch, which differs from the paper in that it uses a multistream transformer for text and audio, with conditioning done every transformer block.
pip install mlx-e2-tts
import mlx.core as mx
from e2_tts_mlx.model import E2TTS
from e2_tts_mlx.trainer import E2Trainer
from e2_tts_mlx.data import load_libritts_r
e2tts = E2TTS(
tokenizer="char-utf8", # or "phoneme_en"
cond_drop_prob = 0.25,
frac_lengths_mask = (0.7, 0.9),
transformer = dict(
dim = 1024,
depth = 24,
heads = 16,
text_depth = 12,
text_heads = 8,
text_ff_mult = 4,
max_seq_len = 4096,
dropout = 0.1
)
)
mx.eval(e2tts.parameters())
batch_size = 128
max_duration = 30
dataset = load_libritts_r(split="dev-clean") # or any audio/caption dataset
trainer = E2Trainer(model = e2tts, num_warmup_steps = 1000)
trainer.train(
train_dataset = ...,
learning_rate = 7.5e-5,
batch_size = batch_size,
total_steps = 1_000_000
)
... after much training ...
cond = ...
text = ...
duration = ... # from a trained DurationPredictor or otherwise
generated_audio = e2tts.sample(
cond = cond,
text = text,
duration = duration,
steps = 32,
cfg_strength = 1.0, # if trained for cfg
use_vocos = True # set to False to get mel spectrograms instead of audio
)
Note the model size specified above (from the paper) is very large. See train_example.py
for a more practical-sized model you can train on your local device.
lucidrains for the original implementation in Pytorch.
@inproceedings{Eskimez2024E2TE,
title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS},
author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda},
year = {2024},
url = {https://api.semanticscholar.org/CorpusID:270738197}
}
@article{Burtsev2021MultiStreamT,
title = {Multi-Stream Transformers},
author = {Mikhail S. Burtsev and Anna Rumshisky},
journal = {ArXiv},
year = {2021},
volume = {abs/2107.10342},
url = {https://api.semanticscholar.org/CorpusID:236171087}
}
The code in this repository is released under the MIT license as found in the LICENSE file.
FAQs
E2-TTS - MLX
We found that mlx-e2-tts demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.