π LLMRouter: An Open-Source Library for LLM Routing
β¨ Introduction
LLMRouter is an intelligent routing system designed to optimize LLM inference by dynamically selecting the most suitable model for each query. To achieve intelligent routing, it defines:
π Smart Routing: Automatically routes queries to the optimal LLM based on task complexity, cost, and performance requirements.
π Multiple Router Models: Support for over 16 routing models, organized into four major categoriesβsingle-round routers, multi-round routers, agentic routers, and personalized routersβcovering a wide range of strategies such as KNN, SVM, MLP, Matrix Factorization, Elo Rating, graph-based routing, BERT-based routing, hybrid probabilistic methods, transformed-score routers, and more.
π οΈ Unified CLI: Complete command-line interface for training, inference, and interactive chat with Gradio-based UI.
π Data Generation Pipeline: Complete pipeline for generating training data from 11 benchmark datasets with automatic API calling and evaluation.
π° News
π [2026-02]: OpenClaw Router - OpenAI-compatible server with OpenClaw integration! We've also released llmrouter-lib v0.3.0. Deploy LLMRouter as a production API server that works seamlessly with Slack, Discord, and other messaging platforms via OpenClaw. Features include multimodal understanding (image/audio/video), retrieval-augmented routing memory, streaming support, and all 16+ LLMRouter routing strategies. See OpenClaw Router Integration.
β [2026-01]: LLMRouter just crossed 1K GitHub stars! We've also released llmrouter-lib v0.2.0. Updates include service-specific dict configs (OpenAI, Anthropic, etc.) and multimodal routing (Video/Image + Text) on Geometry3K, MathVista, and Charades-Egoβall in the first unified open-source LLM routing library with 16+ routers, a unified CLI, Gradio UI, and 11 datasets. Install via pip install llmrouter-lib. More updates soon! π
Clone the repository and install in editable mode using a virtual environment (e.g., with anaconda3):
# Clone the repository
git clone https://github.com/ulab-uiuc/LLMRouter.git
cd LLMRouter
# Create and activate virtual environment
conda create -n llmrouter python=3.10
conda activate llmrouter
# Install the package (base installation)
pip install -e .
# Optional: Install with RouterR1 support (requires GPU)# RouterR1 is tested with vllm==0.6.3 (torch==2.4.0); the extra pins these versions.
pip install -e ".[router-r1]"# Optional: Install all optional dependencies
pip install -e ".[all]"
Install from PyPI
pip install llmrouter-lib
π Setting Up API Keys
LLMRouter requires API keys to make LLM API calls for inference, chat, and data generation. Set the API_KEYS environment variable using one of the following formats:
π‘ Free NVIDIA API Keys: The NVIDIA endpoints currently used in LLMRouter have freely available API keys. To get started, visit https://build.nvidia.com/ to create an account, then you can generate your API keys at no cost.
Service-Specific Dict Format (recommended for multiple providers)
Use this format when you have models from different service providers (e.g., NVIDIA, OpenAI, Anthropic) and want to use different API keys for each provider:
LLMRouter supports locally hosted LLM inference servers that provide OpenAI-compatible APIs (e.g., Ollama, vLLM, SGLang). For local providers, you can use an empty string "" as the API key value - the system automatically detects localhost endpoints and handles authentication accordingly.
Example with Ollama:
export API_KEYS='{"Ollama": ""}'
{"gemma3":{"size":"3B","feature":"Gemma 3B model hosted locally via Ollama","input_price":0.0,"output_price":0.0,"model":"gemma3","service":"Ollama","api_endpoint":"http://localhost:11434/v1"}}
Important: Use the /v1 endpoint (OpenAI-compatible), not the native API endpoints. Empty strings are automatically detected for localhost endpoints (localhost or 127.0.0.1).
π§ͺ Testing Model Availability
You can test the availability of different candidate models using the following curl commands. This is useful for verifying that your API keys work correctly and that specific models are accessible:
Note: If you're using the dict format for API_KEYS, extract the NVIDIA key first (e.g., using echo $API_KEYS | python3 -c "import sys, json; print(json.load(sys.stdin)['NVIDIA'].split(',')[0])"), or set a temporary variable with your NVIDIA API key.
# export API_KEYS=...# Example API endpoint - adjust based on your configuration# This example uses NVIDIA's endpoint, but you should use the endpoint# specified in your LLM candidate JSON or router config
API_ENDPOINT="https://integrate.api.nvidia.com/v1/chat/completions"# Example model list - adjust based on your LLM candidate configuration# These are example models; replace with the actual model names/IDs# from your LLM candidate JSON file
MODELS=(
"qwen/qwen2.5-7b-instruct""meta/llama-3.1-8b-instruct""mistralai/mistral-7b-instruct-v0.3""nvidia/llama-3.3-nemotron-super-49b-v1""mistralai/mixtral-8x7b-instruct-v0.1""mistralai/mixtral-8x22b-instruct-v0.1"
)
SYSTEM_PROMPT="Hello."
PROMPT="Hello."for MODEL in"${MODELS[@]}"; doecho"===== $MODEL ====="
curl "$API_ENDPOINT" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEYS" \
-d "{
\"model\": \"$MODEL\",
\"messages\": [
{
\"role\": \"system\",
\"content\": \"$SYSTEM_PROMPT\"
},
{
\"role\": \"user\",
\"content\": \"$PROMPT\"
}
],
\"temperature\": 0.8,
\"max_tokens\": 200
}"echodone
This script will test each model in the list and display the response, helping you verify which models are available and working with your API key.
π Preparing Training Data
LLMRouter includes a complete data generation pipeline that transforms raw benchmark datasets into formatted routing data with embeddings. The pipeline supports 11 diverse benchmark datasets including Natural QA, Trivia QA, MMLU, GPQA, MBPP, HumanEval, GSM8K, CommonsenseQA, MATH, OpenbookQA, and ARC-Challenge.
π‘ Multimodal Integration: Learn how to incorporate complex multimodal tasks (Video/Image + Text) into LLMRouter by checking our Multimodal Task Guide. We currently support 5 multimodal tasks across 3 datasets (Geometry3K, MathVista, Charades-Ego).
Pipeline Overview
The data generation pipeline consists of three main steps:
Generate Query Data - Extract queries from benchmark datasets and create train/test split JSONL files
Generate LLM Embeddings - Create embeddings for LLM candidates from their metadata
API Calling & Evaluation - Call LLM APIs, evaluate responses, and generate unified embeddings + routing data
Quick Start
Start with the sample configuration file:
# Step 1: Generate query data
python llmrouter/data/data_generation.py --config llmrouter/data/sample_config.yaml
# Step 2: Generate LLM embeddings
python llmrouter/data/generate_llm_embeddings.py --config llmrouter/data/sample_config.yaml
# Step 3: API calling & evaluation (requires API_KEYS - see "Setting Up API Keys" section above)
python llmrouter/data/api_calling_evaluation.py --config llmrouter/data/sample_config.yaml --workers 100
Output Files
The pipeline generates the following files:
Query Data (JSONL): query_data_train.jsonl and query_data_test.jsonl - Query data with train/test split
LLM Embeddings (JSON): default_llm_embeddings.json - LLM metadata with embeddings
Query Embeddings (PyTorch): query_embeddings_longformer.pt - Unified embeddings for all queries
Routing Data (JSONL): default_routing_train_data.jsonl and default_routing_test_data.jsonl - Complete routing data with model responses, performance scores, and token usage
Example routing data entry:
{"task_name":"gsm8k","query":"Janet has 4 apples. She gives 2 to Bob. How many does she have left?","ground_truth":"2","metric":"GSM8K","model_name":"llama3-chatqa-1.5-8b","response":"Janet has 4 apples and gives 2 to Bob, so she has 4 - 2 = 2 apples left.","performance":1.0,"embedding_id":42,"token_num":453}
Configuration
All paths and parameters are controlled via YAML configuration. The sample config file (llmrouter/data/sample_config.yaml) references the example data directory and can be used as-is or customized for your setup.
Note: Step 3 requires API keys for calling LLM services. See the Setting Up API Keys section above for configuration details.
For complete documentation including detailed file formats, embedding mapping system, configuration options, and troubleshooting, see llmrouter/data/README.md.
Training a Router
Before training, ensure you have prepared your data using the Data Generation Pipeline or use the example data in data/example_data/.
Train various router models with your configuration:
LLMRouter supports a plugin system that allows you to add custom router implementations without modifying the core codebase. This makes it easy to experiment with new routing strategies or domain-specific routers.
Quick Start
1. Create your router directory:
mkdir -p custom_routers/my_router
2. Implement your router (custom_routers/my_router/router.py):
from llmrouter.models.meta_router import MetaRouter
import torch.nn as nn
classMyRouter(MetaRouter):
"""Your custom router implementation."""def__init__(self, yaml_path: str):
# Initialize with a model (can be nn.Identity() for simple routers)
model = nn.Identity()
super().__init__(model=model, yaml_path=yaml_path)
# Get available LLM names from config
self.llm_names = list(self.llm_data.keys())
defroute_single(self, query_input: dict) -> dict:
"""Route a single query to the best LLM."""
query = query_input['query']
# Your custom routing logic here# Example: route based on query length
selected_llm = (self.llm_names[0] iflen(query) < 50else self.llm_names[-1])
return {
"query": query,
"model_name": selected_llm,
"predicted_llm": selected_llm,
}
defroute_batch(self, batch: list) -> list:
"""Route multiple queries."""return [self.route_single(q) for q in batch]
data_path:llm_data:'data/example_data/llm_candidates/default_llm.json'hparam:# Your hyperparameters here# Optional: Default API endpoint (used as fallback if models don't specify their own)# Individual models can override this by specifying api_endpoint in the llm_data JSON fileapi_endpoint:'https://integrate.api.nvidia.com/v1'
4. Use your custom router (same as built-in routers!):
# Inference
llmrouter infer --router my_router \
--config custom_routers/my_router/config.yaml \
--query "What is machine learning?"# List all routers (including custom ones)
llmrouter list-routers
from llmrouter.utils import get_longformer_embedding
defroute_single(self, query_input):
embedding = get_longformer_embedding(query_input['query'])
# Use embedding similarity to select best model
selected = self._find_best_model(embedding)
return {"model_name": selected}
Cost-optimized routing:
defroute_single(self, query_input):
difficulty = self._estimate_difficulty(query_input)
# Select cheapest model that can handle the difficultyfor model_name, info insorted(self.llm_data.items(),
key=lambda x: x[1]['cost']):
if info['capability'] >= difficulty:
return {"model_name": model_name}
π Adding Your Own Tasks
LLMRouter supports custom task definitions that allow you to add new task types with custom prompt templates and evaluation metrics. Custom tasks are automatically discovered and integrated into the data generation and evaluation pipeline.
Quick Start
1. Create a task formatter (custom_tasks/my_tasks.py):
Follow our step-by-step walkthrough in the Charades-Ego Integration Guide to process paired egocentric videos, generate VLM-based features, and train routers for Activity, Object, and Verb recognition.
π OpenClaw Router (OpenClaw Integration)
OpenClaw Router is an OpenAI-compatible API server that brings LLMRouter's intelligent routing to production environments. It integrates seamlessly with OpenClaw, enabling you to deploy LLM routing via Slack, Discord, and other messaging platforms.
Why OpenClaw Router?
Feature
Benefit
OpenAI-Compatible API
Drop-in replacement for any OpenAI client (/v1/chat/completions)
All Routing Strategies
Use any of the 16+ LLMRouter strategies (KNN, SVM, MLP, LLM-based, etc.)
Multimodal Understanding
Process images, audio, and video - convert to text for routing decisions
Routing Memory
Persist queryβmodel history; retrieve similar past routes for better decisions
Streaming Support
Full streaming responses with optional [model_name] prefix
Multi-Provider
Route to Together AI, NVIDIA, OpenAI, Anthropic, or local models
Improve personalized routers: stronger user profiling, cold-start strategies, and online feedback updates.
Integrate a multimodal router: support image/audio inputs and route by modality + task type to the right multimodal model.
Add continual/online learning to adapt routers to domain drift (e.g., periodic re-training + feedback loops).
π Acknowledgments
LLMRouter builds upon the excellent research from the community. We gratefully acknowledge the following works that inspired our router implementations:
RouteLLM - Learning to Route LLMs with Preference Data (ICLR 2025)
RouterDC - Query-Based Router by Dual Contrastive Learning (NeurIPS 2024)
AutoMix - Automatically Mixing Language Models (NeurIPS 2024)
Hybrid LLM - Cost-Efficient and Quality-Aware Query Routing (ICLR 2024)
GraphRouter - A Graph-based Router for LLM Selections (ICLR 2025)
GMTRouter - Personalized LLM Router over Multi-turn User Interactions
PersonalizedRouter - Personalized LLM Routing via Graph-based User Preference Modeling
Router-R1 - Teaching LLMs Multi-Round Routing and Aggregation via RL (NeurIPS 2025)
FusionFactory - Fusing LLM Capabilities with Multi-LLM Log Data
We warmly welcome contributions from the community! A powerful open-source router framework requires the collective effort of everyone. If you have developed a new routing method, please consider submitting a PR to add it to LLMRouter. Together, we can build the most comprehensive LLM routing library!
π€ Contribution
We warmly welcome contributions from the community. LLMRouter is a living, extensible research framework, and its impact grows through the creativity and expertise of its contributors.
If you have developed a new routing strategy, learning objective, training paradigm, or evaluation protocol, we strongly encourage you to submit a pull request to integrate it into LLMRouter. All accepted contributions are explicitly credited, documented, and made available to a broad research and practitioner audience.
Contributing to LLMRouter is more than adding code. It is an opportunity to increase the visibility, adoption, and long-term impact of your work within the LLM systems community. Together, we aim to build the most comprehensive and extensible open-source library for LLM routing.
Notable contributions may be highlighted in documentation, examples, benchmarks, or future releases.
Star History
π Citation
If you find LLMRouter useful for your research or projects, please cite it as:
@misc{llmrouter2025,
title = {LLMRouter: An Open-Source Library for LLM Routing},
author = {Tao Feng and Haozhen Zhang and Zijie Lei and Haodong Yue and Chongshan Lin and Ge Liu and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/ulab-uiuc/LLMRouter}},
note = {GitHub repository}
}
FAQs
A unified framework for LLM routing and evaluation.
We found that llmrouter-lib demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Five malicious NuGet packages impersonate Chinese .NET libraries to deploy a stealer targeting browser credentials, crypto wallets, SSH keys, and local files.
The remediated findings include organization permission bugs, stale project access after transfers, OIDC replay edge cases, audit logging gaps, and an IDOR in API token deletion.