š
LiteLLM
Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]
LiteLLM manages:
- Translate inputs to provider's
completion
, embedding
, and image_generation
endpoints - Consistent output, text responses will always be available at
['choices'][0]['message']['content']
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
- Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)
Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers
šØ Stable Release: Use docker images with the -stable
tag. These have undergone 12 hour load tests, before being published.
Support for more providers. Missing a provider or LLM Platform, raise a feature request.
[!IMPORTANT]
LiteLLM v1.0.0 now requires openai>=1.0.0
. Migration guide here
LiteLLM v1.40.14+ now requires pydantic>=2.0.0
. No changes required.
pip install litellm
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
response = completion(model="gpt-3.5-turbo", messages=messages)
response = completion(model="command-nightly", messages=messages)
print(response)
Call any model supported by a provider, with model=<provider_name>/<model_name>
. There might be provider-specific details here, so refer to provider docs for more information
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="gpt-3.5-turbo", messages=messages)
return response
response = asyncio.run(test_get_response())
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True
to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
response = completion('claude-2', messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
Logging Observability (Docs)
LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack, MLflow
from litellm import completion
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
os.environ["OPENAI_API_KEY"]
litellm.success_callback = ["lunary", "langfuse", "athina", "helicone"]
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi š - i'm openai"}])
LiteLLM Proxy Server (LLM Gateway) - (Docs)
Track spend + Load Balance across multiple projects
Hosted Proxy (Preview)
The proxy provides:
- Hooks for auth
- Hooks for logging
- Cost tracking
- Rate Limiting
Quick Start Proxy - CLI
pip install 'litellm[proxy]'
Step 1: Start litellm proxy
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:4000
Step 2: Make ChatCompletions Request to Proxy
[!IMPORTANT]
š” Use LiteLLM Proxy with Langchain (Python, JS), OpenAI SDK (Python, JS) Anthropic SDK, Mistral SDK, LlamaIndex, Instructor, Curl
import openai
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000")
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
Proxy Key Management (Docs)
Connect the proxy with a Postgres DB to create proxy keys
git clone https://github.com/BerriAI/litellm
cd litellm
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="sk-1234"' > .env
source .env
docker-compose up
UI on /ui
on your proxy server
Set budgets and rate limits across multiple projects
POST /key/generate
Request
curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'
Expected Response
{
"key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
"expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
}
Supported Providers (Docs)
Read the Docs
Contributing
To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.
Here's how to modify the repo locally:
Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git
Step 2: Navigate into the project, and install dependencies:
cd litellm
poetry install -E extra_proxy -E proxy
Step 3: Test your change:
cd litellm/tests # pwd: Documents/litellm/litellm/tests
poetry run flake8
poetry run pytest .
Step 4: Submit a PR with your changes! š
- push your fork to your GitHub repo
- submit a PR from there
Building LiteLLM Docker Image
Follow these instructions if you want to build / run the LiteLLM Docker Image yourself.
Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git
Step 2: Build the Docker Image
Build using Dockerfile.non_root
docker build -f docker/Dockerfile.non_root -t litellm_test_image .
Step 3: Run the Docker Image
Make sure config.yaml is present in the root directory. This is your litellm proxy config file.
docker run \
-v $(pwd)/proxy_config.yaml:/app/config.yaml \
-e DATABASE_URL="postgresql://xxxxxxxx" \
-e LITELLM_MASTER_KEY="sk-1234" \
-p 4000:4000 \
litellm_test_image \
--config /app/config.yaml --detailed_debug
Enterprise
For companies that need better security, user management and professional support
Talk to founders
This covers:
- ā
Features under the LiteLLM Commercial License:
- ā
Feature Prioritization
- ā
Custom Integrations
- ā
Professional Support - Dedicated discord + slack
- ā
Custom SLAs
- ā
Secure access with Single Sign-On
Code Quality / Linting
LiteLLM follows the Google Python Style Guide.
We run:
If you have suggestions on how to improve the code quality feel free to open an issue or a PR.
Support / talk with founders
Why did we build this
- Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.
Contributors