
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
MLX-Audio is a package for inference of text-to-speech (TTS) and speech-to-speech (STS) models locally on your Mac using MLX
A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.
# Install the package
pip install mlx-audio
# For web interface and API dependencies
pip install -r requirements.txt
To generate audio with an LLM use:
# Basic usage
mlx_audio.tts.generate --text "Hello, world"
# Specify prefix for output file
mlx_audio.tts.generate --text "Hello, world" --file_prefix hello
# Adjust speaking speed (0.5-2.0)
mlx_audio.tts.generate --text "Hello, world" --speed 1.4
To generate audio with an LLM use:
from mlx_audio.tts.generate import generate_audio
# Example: Generate an audiobook chapter as mp3 audio
generate_audio(
text=("In the beginning, the universe was created...\n"
"...or the simulation was booted up."),
model_path="prince-canuma/Kokoro-82M",
voice="af_heart",
speed=1.2,
lang_code="a", # Kokoro: (a)f_heart, or comment out for auto
file_prefix="audiobook_chapter1",
audio_format="wav",
sample_rate=24000,
join_audio=True,
verbose=True # Set to False to disable print messages
)
print("Audiobook chapter successfully generated!")
MLX-Audio includes a web interface with a 3D visualization that reacts to audio frequencies. The interface allows you to:
To start the web interface and API server:
# Using the command-line interface
mlx_audio.server
# With custom host and port
mlx_audio.server --host 0.0.0.0 --port 9000
# With verbose logging
mlx_audio.server --verbose
Available command line arguments:
--host
: Host address to bind the server to (default: 127.0.0.1)--port
: Port to bind the server to (default: 8000)Then open your browser and navigate to:
http://127.0.0.1:8000
The server provides the following REST API endpoints:
POST /tts
: Generate TTS audio
text
: The text to convert to speech (required)voice
: Voice to use (default: "af_heart")speed
: Speech speed from 0.5 to 2.0 (default: 1.0)GET /audio/{filename}
: Retrieve generated audio file
POST /play
: Play audio directly from the server
filename
: The filename of the audio to play (required)POST /stop
: Stop any currently playing audio
POST /open_output_folder
: Open the output folder in the system's file explorer
Note: Generated audio files are stored in
~/.mlx_audio/outputs
by default, or in a fallback directory if that location is not writable.
Kokoro is a multilingual TTS model that supports various languages and voice styles.
from mlx_audio.tts.models.kokoro import KokoroPipeline
from mlx_audio.tts.utils import load_model
from IPython.display import Audio
import soundfile as sf
# Initialize the model
model_id = 'prince-canuma/Kokoro-82M'
model = load_model(model_id)
# Create a pipeline with American English
pipeline = KokoroPipeline(lang_code='a', model=model, repo_id=model_id)
# Generate audio
text = "The MLX King lives. Let him cook!"
for _, _, audio in pipeline(text, voice='af_heart', speed=1, split_pattern=r'\n+'):
# Display audio in notebook (if applicable)
display(Audio(data=audio, rate=24000, autoplay=0))
# Save audio to file
sf.write('audio.wav', audio[0], 24000)
'a'
- American English'b'
- British English'j'
- Japanese (requires pip install misaki[ja]
)'z'
- Mandarin Chinese (requires pip install misaki[zh]
)CSM is a model from Sesame that allows you text-to-speech and to customize voices using reference audio samples.
# Generate speech using CSM-1B model with reference audio
python -m mlx_audio.tts.generate --model mlx-community/csm-1b --text "Hello from Sesame." --play --ref_audio ./conversational_a.wav
You can pass any audio to clone the voice from or download sample audio file from here.
You can quantize models for improved performance:
from mlx_audio.tts.utils import quantize_model, load_model
import json
import mlx.core as mx
model = load_model(repo_id='prince-canuma/Kokoro-82M')
config = model.config
# Quantize to 8-bit
group_size = 64
bits = 8
weights, config = quantize_model(model, config, group_size, bits)
# Save quantized model
with open('./8bit/config.json', 'w') as f:
json.dump(config, f)
mx.save_safetensors("./8bit/kokoro-v1_0.safetensors", weights, metadata={"format": "mlx"})
FAQs
MLX-Audio is a package for inference of text-to-speech (TTS) and speech-to-speech (STS) models locally on your Mac using MLX
We found that mlx-audio demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.