You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

sglang-router

Package Overview
Dependencies
Maintainers
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sglang-router

High-performance Rust-based load balancer for SGLang with multiple routing algorithms and prefill-decode disaggregation support

0.1.8
pipPyPI
Maintainers
2

SGLang Router

SGLang router is a standalone Rust module that enables data parallelism across SGLang instances, providing high-performance request routing and advanced load balancing. The router supports multiple load balancing algorithms including cache-aware, power of two, random, and round robin, and acts as a specialized load balancer for prefill-decode disaggregated serving architectures.

Documentation

  • User Guide: docs.sglang.ai/router/router.html

Quick Start

Prerequisites

Rust and Cargo:

# Install rustup (Rust installer and version manager)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Follow the installation prompts, then reload your shell
source $HOME/.cargo/env

# Verify installation
rustc --version
cargo --version

Python with pip installed

Installation

# Install build dependencies
pip install setuptools-rust wheel build

# Build the wheel package
python -m build

# Install the generated wheel
pip install dist/*.whl

# One-liner for development (rebuild + install)
python -m build && pip install --force-reinstall dist/*.whl

Option B: Development Mode

pip install -e .

⚠️ Warning: Editable installs may suffer performance degradation. Use wheel builds for performance testing.

Basic Usage

# Build Rust components
cargo build

Launch Router with Worker URLs in regular mode

# Launch router with worker URLs
python -m sglang_router.launch_router \
    --worker-urls http://worker1:8000 http://worker2:8000

Launch Router with Worker URLs in prefill-decode mode

# Note that the prefill and decode URLs must be provided in the following format:
# http://<ip>:<port> for  decode nodes
# http://<ip>:<port> bootstrap-port for  prefill nodes, where bootstrap-port is optional
# Launch router with worker URLs
python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --prefill http://127.0.0.1:30001 9001 \
    --prefill http://127.0.0.2:30002 9002 \
    --prefill http://127.0.0.3:30003 9003 \
    --prefill http://127.0.0.4:30004 9004 \
    --decode http://127.0.0.5:30005 \
    --decode http://127.0.0.6:30006 \
    --decode http://127.0.0.7:30007 \
    --host 0.0.0.0 \
    --port 8080

Configuration

Logging

Enable structured logging with optional file output:

from sglang_router import Router

# Console logging (default)
router = Router(worker_urls=["http://worker1:8000", "http://worker2:8000"])

# File logging enabled
router = Router(
    worker_urls=["http://worker1:8000", "http://worker2:8000"],
    log_dir="./logs"  # Daily log files created here
)

Set log level with --log-level flag (documentation).

Metrics

Prometheus metrics endpoint available at 127.0.0.1:29000 by default.

# Custom metrics configuration
python -m sglang_router.launch_router \
    --worker-urls http://localhost:8080 http://localhost:8081 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9000

Request ID Tracking

Track requests across distributed systems with configurable headers:

# Use custom request ID headers
python -m sglang_router.launch_router \
    --worker-urls http://localhost:8080 \
    --request-id-headers x-trace-id x-request-id

Default headers: x-request-id, x-correlation-id, x-trace-id, request-id

Advanced Features

Kubernetes Service Discovery

Automatic worker discovery and management in Kubernetes environments.

Basic Service Discovery

python -m sglang_router.launch_router \
    --service-discovery \
    --selector app=sglang-worker role=inference \
    --service-discovery-namespace default

PD (Prefill-Decode) Mode

For disaggregated prefill/decode routing:

python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill \
    --decode-selector app=sglang component=decode \
    --service-discovery-namespace sglang-system

# With separate routing policies:
python -m sglang_router.launch_router \
    --pd-disaggregation \
    --prefill-policy cache_aware \
    --decode-policy power_of_two \
    --service-discovery \
    --prefill-selector app=sglang component=prefill \
    --decode-selector app=sglang component=decode \
    --service-discovery-namespace sglang-system

Kubernetes Pod Configuration

Prefill Server Pod:

apiVersion: v1
kind: Pod
metadata:
  name: sglang-prefill-1
  labels:
    app: sglang
    component: prefill
  annotations:
    sglang.ai/bootstrap-port: "9001"  # Optional: Bootstrap port
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
    - containerPort: 8000  # Main API port
    - containerPort: 9001  # Optional: Bootstrap port

Decode Server Pod:

apiVersion: v1
kind: Pod
metadata:
  name: sglang-decode-1
  labels:
    app: sglang
    component: decode
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
    - containerPort: 8000

RBAC Configuration

Namespace-scoped (recommended):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sglang-router
  namespace: sglang-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: sglang-system
  name: sglang-router
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: sglang-router
  namespace: sglang-system
subjects:
- kind: ServiceAccount
  name: sglang-router
  namespace: sglang-system
roleRef:
  kind: Role
  name: sglang-router
  apiGroup: rbac.authorization.k8s.io

Complete PD Example

python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill environment=production \
    --decode-selector app=sglang component=decode environment=production \
    --service-discovery-namespace production \
    --host 0.0.0.0 \
    --port 8080 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9090

Command Line Arguments Reference

Service Discovery

  • --service-discovery: Enable Kubernetes service discovery
  • --service-discovery-port: Port for worker URLs (default: 8000)
  • --service-discovery-namespace: Kubernetes namespace to watch
  • --selector: Label selectors for regular mode (format: key1=value1 key2=value2)

PD Mode

  • --pd-disaggregation: Enable Prefill-Decode disaggregated mode
  • --prefill: Initial prefill server (format: URL BOOTSTRAP_PORT)
  • --decode: Initial decode server URL
  • --prefill-selector: Label selector for prefill pods
  • --decode-selector: Label selector for decode pods
  • --policy: Routing policy (cache_aware, random, power_of_two, round_robin)
  • --prefill-policy: Separate routing policy for prefill nodes (optional, overrides --policy for prefill)
  • --decode-policy: Separate routing policy for decode nodes (optional, overrides --policy for decode)

Development

Build Process

# Build Rust project
cargo build

# Build Python binding (see Installation section above)

Note: When modifying Rust code, you must rebuild the wheel for changes to take effect.

Troubleshooting

VSCode Rust Analyzer Issues: Set rust-analyzer.linkedProjects to the absolute path of Cargo.toml:

{
  "rust-analyzer.linkedProjects": ["/workspaces/sglang/sgl-router/Cargo.toml"]
}

CI/CD Pipeline

The continuous integration pipeline includes comprehensive testing, benchmarking, and publishing:

Build & Test

  • Build Wheels: Uses cibuildwheel for manylinux x86_64 packages
  • Build Source Distribution: Creates source distribution for pip fallback
  • Rust HTTP Server Benchmarking: Performance testing of router overhead
  • Basic Inference Testing: End-to-end validation through the router
  • PD Disaggregation Testing: Benchmark and sanity checks for prefill-decode load balancing

Publishing

  • PyPI Publishing: Wheels and source distributions are published only when the version changes in pyproject.toml
  • Container Images: Docker images published using /docker/Dockerfile.router

Features

  • High Performance: Rust-based routing with connection pooling and optimized request handling
  • Advanced Load Balancing: Multiple algorithms including:
    • Cache-Aware: Intelligent routing based on cache locality for optimal performance
    • Power of Two: Chooses the less loaded of two randomly selected workers
    • Random: Distributes requests randomly across available workers
    • Round Robin: Sequential distribution across workers in rotation
  • Prefill-Decode Disaggregation: Specialized load balancing for separated prefill and decode servers
  • Service Discovery: Automatic Kubernetes worker discovery and health management
  • Monitoring: Comprehensive Prometheus metrics and structured logging
  • Scalability: Handles thousands of concurrent connections with efficient resource utilization

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts