🚀 Big News:Socket Has Acquired Secure Annex.Learn More →

Book a Demo Sign in

llama-stack

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

llama-stack

Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure

PyPI

Version: 0.3.3

Maintainers: 0

Llama Stack

Quick Start | Documentation | OpenAI API Compatibility | Discord

Open-source agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.

Llama Stack Architecture

Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)

What you get

Chat Completions & Embeddings — standard /v1/chat/completions, /v1/completions, and /v1/embeddings endpoints, compatible with any OpenAI client
Responses API — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call (learn more)
Vector Stores & Files — /v1/vector_stores and /v1/files for managed document storage and search
Batches — /v1/batches for offline batch processing
Open Responses conformant — the Responses API implementation passes the Open Responses conformance test suite

Use any model, use any infrastructure

Llama Stack has a pluggable provider architecture. Develop locally with Ollama, deploy to production with vLLM, or connect to a managed service — the API stays the same.

See the provider documentation for the full list.

Get started

Install and run a Llama Stack server:

# One-line install
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash

# Or install via uv
uv pip install llama-stack

# Start the server (uses the starter distribution with Ollama)
llama stack run

Then connect with any OpenAI client — Python, TypeScript, curl, or any framework that speaks the OpenAI API.

See the Quick Start guide for detailed setup.

Resources

Documentation — full reference
OpenAI API Compatibility — endpoint coverage and provider matrix
Getting Started Notebook — text and vision inference walkthrough
Contributing — how to contribute

Client SDKs:

Language	SDK	Package
Python	llama-stack-client-python
TypeScript	llama-stack-client-typescript

Community

We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.

Thanks to all our amazing contributors!

FAQs

What is llama-stack?

Is llama-stack well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install