LLM Client - Enhanced Functional Programming Architecture
A powerful, functional programming approach to LLM orchestration with tool calling, provider routing, and automatic failover capabilities.
Architecture Overview
The LLM Client follows a functional programming architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ LLM Client Architecture │
├─────────────────────────────────────────────────────────────┤
│ Core Layer (orchestrator_v2.py) │
│ ├── Data Structures (LLMRequest, LLMResponse, etc.) │
│ ├── Tool Functions (execute_tool_call, get_available_tools)│
│ ├── Router Service (ProviderRouter, RoutingStrategy) │
│ └── Core Orchestrator Functions (4 main functions) │
├─────────────────────────────────────────────────────────────┤
│ Integration Layer (__init__.py) │
│ ├── Low-level Functions (requests.py, serialization.py) │
│ ├── Tool System (tools/) │
│ └── Public API (6 core functions) │
├─────────────────────────────────────────────────────────────┤
│ Provider Layer (requests.py, serialization.py) │
│ ├── HTTP Request Building │
│ ├── Response Parsing (OpenAI, Google formats) │
│ └── Streaming Support │
└─────────────────────────────────────────────────────────────┘
Key Design Principles
1. Functional Programming
- Pure Functions: No side effects, predictable behavior
- Composability: Functions can be easily combined and pipelined
- Immutability: Data structures are immutable, no hidden state
- Stateless: No instance variables to manage
2. Automatic Tool Detection
- Tools are automatically detected from
ToolMeta.registry
- No manual tool management required
- Smart inclusion based on
tools_enabled flag
3. Multiple Routing Strategies
- PRIORITY: Try providers in priority order (default)
- RANDOM: Start with random provider, then priority
- CYCLE: Cycle through providers continuously
4. Proper Tool Call Parsing
- OpenAI Format: Parses
tool_calls with function.name and function.arguments
- Google Format: Parses
function_call with name and args
- Automatic Detection: Based on provider, uses correct parsing format
Core Components
Data Structures
@dataclass
class LLMRequest:
messages: List[Dict[str, str]]
model: Optional[str] = None
temperature: float = 0.7
max_tokens: Optional[int] = None
tools_enabled: bool = True
metadata: Dict[str, Any] = None
@dataclass
class ProviderConfig:
name: str
priority: int
status: ProviderStatus = ProviderStatus.AVAILABLE
retry_count: int = 0
max_retries: int = 3
backoff_seconds: int = 60
Core Functions (Reduced Set)
1. stream_llm_response()
Stream from a single provider with proper tool call parsing.
async for chunk_type, content in stream_llm_response(request, "openai"):
if chunk_type == "t":
print(content)
elif chunk_type == "f":
print(f"Function: {content}")
2. stream_with_router()
Stream with automatic failover and routing strategies.
router = create_router(["openai", "google_gemini"], strategy=RoutingStrategy.PRIORITY)
async for chunk_type, content, provider in stream_with_router(request, router):
print(f"[{provider}]: {content}")
3. chat_with_tools()
Single provider chat with automatic tool execution.
async for chunk_type, content in chat_with_tools(request, "openai", max_iterations=5):
print(content)
4. chat_with_tools_and_router()
Multi-provider chat with tools and failover.
async for chunk_type, content, provider in chat_with_tools_and_router(request, router):
print(f"[{provider}]: {content}")
Convenience Functions
1. quick_chat()
Simple single-provider chat.
response = await quick_chat("Hello!", "openai", tools_enabled=True)
2. quick_chat_with_router()
Simple chat with routing strategies.
response, provider = await quick_chat_with_router(
"Hello!",
["openai", "google_gemini"],
strategy=RoutingStrategy.RANDOM
)
Routing Strategies
PRIORITY Strategy (Default)
router = create_router(
["openai", "google_gemini", "anthropic"],
strategy=RoutingStrategy.PRIORITY
)
RANDOM Strategy
router = create_router(
["openai", "google_gemini", "anthropic"],
strategy=RoutingStrategy.RANDOM
)
CYCLE Strategy
router = create_router(
["openai", "google_gemini", "anthropic"],
strategy=RoutingStrategy.CYCLE
)
async for chunk_type, content, provider in stream_with_router(request, router, max_cycles=3):
print(f"[{provider}]: {content}")
Tool System Integration
Tool Registration
from utils.llm_client import Tool
@Tool
def calculator(expression: str) -> str:
"""Calculate a mathematical expression safely."""
try:
result = eval(expression)
return f"Result: {result}"
except Exception as e:
return f"Error: {str(e)}"
Automatic Tool Detection
request = LLMRequest(
messages=[{"role": "user", "content": "What's 5 * 7?"}],
tools_enabled=True
)
Tool Call Execution
async for chunk_type, content in chat_with_tools(request, "openai"):
print(content)
Provider Support
Supported Providers
- OpenAI (GPT models)
- Google Gemini (Gemini models)
- Anthropic (Claude models)
- OpenRouter (Multiple models)
Provider Configuration
providers = [
ProviderConfig(
name="openai",
priority=0,
status=ProviderStatus.AVAILABLE,
max_retries=2,
backoff_seconds=30
),
ProviderConfig(
name="google_gemini",
priority=1,
status=ProviderStatus.AVAILABLE,
max_retries=3,
backoff_seconds=60
)
]
router = ProviderRouter(providers, RoutingStrategy.PRIORITY)
Error Handling & Resilience
Automatic Failover
- If a provider fails, automatically try the next available provider
- Configurable retry limits and backoff periods
- Status tracking for each provider
Tool Call Error Handling
- Graceful handling of malformed tool calls
- Error messages returned to LLM for context
- Exception handling for tool execution failures
Retry Logic
provider = ProviderConfig(
name="openai",
max_retries=3,
backoff_seconds=60
)
Usage Examples
Basic Usage
from utils.llm_client import quick_chat, quick_chat_with_router, RoutingStrategy
response = await quick_chat("Hello!", "openai")
response, provider = await quick_chat_with_router(
"Hello!",
["openai", "google_gemini"]
)
Advanced Usage
from utils.llm_client import (
LLMRequest, create_router, stream_with_router,
RoutingStrategy, Tool
)
@Tool
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: Sunny, 72°F"
request = LLMRequest(
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools_enabled=True
)
router = create_router(
["openai", "google_gemini"],
strategy=RoutingStrategy.RANDOM
)
async for chunk_type, content, provider in stream_with_router(request, router):
if chunk_type == "t":
print(f"[{provider}]: {content}")
Custom Pipeline
async def my_pipeline(request):
router = create_router(["openai", "google_gemini"])
async for chunk_type, content, provider in stream_with_router(request, router):
processed_content = process_chunk(content)
yield chunk_type, processed_content, provider
async for chunk_type, content, provider in my_pipeline(request):
print(f"[{provider}]: {content}")
Benefits
- Simplified API: Only 6 core functions instead of many
- Automatic Detection: Tools are automatically detected and included
- Flexible Routing: Multiple routing strategies for different use cases
- Proper Parsing: Correct tool call parsing for different providers
- Cycle Support: Can cycle through providers for load balancing
- Error Resilience: Automatic failover and retry logic
- Functional Composition: Easy to compose and extend
- Provider Agnostic: Works with multiple LLM providers
- Tool Integration: Seamless tool calling with automatic execution
- Performance: Efficient streaming and concurrent tool execution
Migration from OOP Approach
Before (OOP)
client = LLMClient("openai")
client.enable_tools()
async for chunk in client.chat_with_tools(messages):
print(chunk)
After (Functional)
request = LLMRequest(messages=messages, tools_enabled=True)
async for chunk_type, content in chat_with_tools(request, "openai"):
if chunk_type == "t":
print(content)
The functional approach is more flexible, composable, and provides better separation of concerns while maintaining simplicity.