
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
MLX Omni Server is a server that provides OpenAI-compatible APIs using Apple's MLX framework.
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. It implements OpenAI-compatible API endpoints, enabling seamless integration with existing OpenAI SDK clients while leveraging the power of local ML inference.
The server implements OpenAI-compatible endpoints:
/v1/chat/completions
/v1/audio/speech
- Text-to-Speech/v1/audio/transcriptions
- Speech-to-Text/v1/models
- List models/v1/models/{model}
- Retrieve or Delete model/v1/images/generations
- Image generation/v1/embeddings
- Create embeddings for textFollow these simple steps to get started with MLX Omni Server:
pip install mlx-omni-server
mlx-omni-server
curl http://localhost:10240/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/gemma-3-1b-it-4bit-DWQ",
"messages": [
{
"role": "user",
"content": "What can you do?"
}
]
}'
That's it! You're now running AI locally on your Mac. See Advanced Usage for more examples.
# Start with default settings (port 10240)
mlx-omni-server
# Or specify a custom port
mlx-omni-server --port 8000
# View all available options
mlx-omni-server --help
from openai import OpenAI
# Connect to your local server
client = OpenAI(
base_url="http://localhost:10240/v1", # Point to local server
api_key="not-needed" # API key not required
)
# Make a simple chat request
response = client.chat.completions.create(
model="mlx-community/gemma-3-1b-it-4bit-DWQ",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
MLX Omni Server supports multiple ways of interaction and various AI capabilities. Here's how to use each:
MLX Omni Server provides flexible ways to interact with AI capabilities:
Access the server directly using HTTP requests:
# Chat completions endpoint
curl http://localhost:10240/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/gemma-3-1b-it-4bit-DWQ",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Get available models
curl http://localhost:10240/v1/models
Use the official OpenAI Python SDK for seamless integration:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:10240/v1", # Point to local server
api_key="not-needed" # API key not required for local server
)
See the FAQ section for information on using TestClient for development.
response = client.chat.completions.create(
model="mlx-community/Llama-3.2-3B-Instruct-4bit",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
temperature=0,
stream=True # this time, we set stream=True
)
for chunk in response:
print(chunk)
print(chunk.choices[0].delta.content)
print("****************")
curl http://localhost:10240/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Llama-3.2-3B-Instruct-4bit",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
speech_file_path = "mlx_example.wav"
response = client.audio.speech.create(
model="lucasnewman/f5-tts-mlx",
voice="alloy", # voice si not working for now
input="MLX project is awsome.",
)
response.stream_to_file(speech_file_path)
curl -X POST "http://localhost:10240/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"model": "lucasnewman/f5-tts-mlx",
"input": "MLX project is awsome",
"voice": "alloy"
}' \
--output ~/Desktop/mlx.wav
audio_file = open("speech.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="mlx-community/whisper-large-v3-turbo",
file=audio_file
)
print(transcript.text)
curl -X POST "http://localhost:10240/v1/audio/transcriptions" \
-H "Content-Type: multipart/form-data" \
-F "file=@mlx_example.wav" \
-F "model=mlx-community/whisper-large-v3-turbo"
Response:
{
"text": " MLX Project is awesome!"
}
image_response = client.images.generate(
model="argmaxinc/mlx-FLUX.1-schnell",
prompt="A serene landscape with mountains and a lake",
n=1,
size="512x512"
)
curl http://localhost:10240/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "argmaxinc/mlx-FLUX.1-schnell",
"prompt": "A cute baby sea otter",
"n": 1,
"size": "1024x1024"
}'
# Generate embedding for a single text
response = client.embeddings.create(
model="mlx-community/all-MiniLM-L6-v2-4bit", input="I like reading"
)
# Examine the response structure
print(f"Response type: {type(response)}")
print(f"Model used: {response.model}")
print(f"Embedding dimension: {len(response.data[0].embedding)}")
curl http://localhost:10240/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/all-MiniLM-L6-v2-4bit",
"input": ["Hello world!", "Embeddings are useful for semantic search."]
}'
For more detailed examples, check out the examples directory.
MLX Omni Server uses Hugging Face for model downloading and management. When you specify a model ID that hasn't been downloaded yet, the framework will automatically download it. However, since download times can vary significantly:
model
parameter to the local model path# Using a model from Hugging Face
response = client.chat.completions.create(
model="mlx-community/gemma-3-1b-it-4bit-DWQ", # Will download if not available
messages=[{"role": "user", "content": "Hello"}]
)
# Using a local model
response = client.chat.completions.create(
model="/path/to/your/local/model", # Local model path
messages=[{"role": "user", "content": "Hello"}]
)
The models currently supported on the machine can also be accessed through the following methods
curl http://localhost:10240/v1/models
Use the model
parameter when creating a request:
response = client.chat.completions.create(
model="mlx-community/gemma-3-1b-it-4bit-DWQ", # Specify model here
messages=[{"role": "user", "content": "Hello"}]
)
Yes, TestClient allows you to use the OpenAI client without starting a local server. This is particularly useful for development and testing scenarios:
from openai import OpenAI
from fastapi.testclient import TestClient
from mlx_omni_server.main import app
# Use TestClient directly - no network service needed
client = OpenAI(
http_client=TestClient(app)
)
# Now you can use the client just like with a running server
response = client.chat.completions.create(
model="mlx-community/gemma-3-1b-it-4bit-DWQ",
messages=[{"role": "user", "content": "Hello"}]
)
This approach bypasses the HTTP server entirely, making it ideal for unit testing and quick development iterations.
We welcome contributions! If you're interested in contributing to MLX Omni Server, please check out our Development Guide for detailed information about:
For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
This project is not affiliated with or endorsed by OpenAI or Apple. It's an independent implementation that provides OpenAI-compatible APIs using Apple's MLX framework.
FAQs
MLX Omni Server is a server that provides OpenAI-compatible APIs using Apple's MLX framework.
We found that mlx-omni-server demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.