Vocos — MLX
Implementation of Vocos with the MLX framework. Vocos allows for high quality reconstruction of audio from Mel spectrograms or EnCodec tokens.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Paper [abs] [pdf]
Installation
To use Vocos in inference mode, install it using:
pip install vocos-mlx
Usage
Mel Spectrogram
from vocos_mlx import Vocos, load_audio, log_mel_spectrogram
vocos = Vocos.from_pretrained("lucasnewman/vocos-mel-24khz")
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio)
mel_spec = log_mel_spectrogram(audio, n_mels = 100)
decoded_audio = vocos.decode(mel_spec)
EnCodec
from vocos_mlx import Vocos, load_audio
vocos = Vocos.from_pretrained("lucasnewman/vocos-encodec-24khz")
audio = load_audio("audio.wav", 24_000)
reconstructed_audio = vocos(audio, bandwidth_id = 3)
codes = vocos.get_encodec_codes(audio, bandwidth_id = 3)
decoded_audio = vocos.decode_from_codes(codes, bandwidth_id = 3)
Appreciation
Awni Hannun for the reference EnCodec implementation for MLX.
Citations
@article{siuzdak2023vocos,
title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
author={Siuzdak, Hubert},
journal={arXiv preprint arXiv:2306.00814},
year={2023}
}
License
The code in this repository is released under the MIT license as found in the
LICENSE file.