┌─────────────────────────────────────────────────────────────┐
│                    LLM Client Architecture                  │
├─────────────────────────────────────────────────────────────┤
│  Core Layer (orchestrator_v2.py)                           │
│  ├── Data Structures (LLMRequest, LLMResponse, etc.)       │
│  ├── Tool Functions (execute_tool_call, get_available_tools)│
│  ├── Router Service (ProviderRouter, RoutingStrategy)      │
│  └── Core Orchestrator Functions (4 main functions)        │
├─────────────────────────────────────────────────────────────┤
│  Integration Layer (__init__.py)                           │
│  ├── Low-level Functions (requests.py, serialization.py)   │
│  ├── Tool System (tools/)                                  │
│  └── Public API (6 core functions)                         │
├─────────────────────────────────────────────────────────────┤
│  Provider Layer (requests.py, serialization.py)            │
│  ├── HTTP Request Building                                 │
│  ├── Response Parsing (OpenAI, Google formats)             │
│  └── Streaming Support                                     │
└─────────────────────────────────────────────────────────────┘

Key Design Principles

1. Functional Programming

Pure Functions: No side effects, predictable behavior
Composability: Functions can be easily combined and pipelined
Immutability: Data structures are immutable, no hidden state
Stateless: No instance variables to manage

2. Automatic Tool Detection

Tools are automatically detected from ToolMeta.registry
No manual tool management required
Smart inclusion based on tools_enabled flag

3. Multiple Routing Strategies

PRIORITY: Try providers in priority order (default)
RANDOM: Start with random provider, then priority
CYCLE: Cycle through providers continuously

4. Proper Tool Call Parsing

OpenAI Format: Parses tool_calls with function.name and function.arguments
Google Format: Parses function_call with name and args
Automatic Detection: Based on provider, uses correct parsing format

Core Components

Data Structures

@dataclass
class LLMRequest:
    messages: List[Dict[str, str]]
    model: Optional[str] = None
    temperature: float = 0.7
    max_tokens: Optional[int] = None
    tools_enabled: bool = True
    metadata: Dict[str, Any] = None

@dataclass
class ProviderConfig:
    name: str
    priority: int
    status: ProviderStatus = ProviderStatus.AVAILABLE
    retry_count: int = 0
    max_retries: int = 3
    backoff_seconds: int = 60

Core Functions (Reduced Set)

1. stream_llm_response()

Stream from a single provider with proper tool call parsing.

async for chunk_type, content in stream_llm_response(request, "openai"):
    if chunk_type == "t":  # text
        print(content)
    elif chunk_type == "f":  # function call
        print(f"Function: {content}")

2. stream_with_router()

Stream with automatic failover and routing strategies.

router = create_router(["openai", "google_gemini"], strategy=RoutingStrategy.PRIORITY)
async for chunk_type, content, provider in stream_with_router(request, router):
    print(f"[{provider}]: {content}")

3. chat_with_tools()

Single provider chat with automatic tool execution.

async for chunk_type, content in chat_with_tools(request, "openai", max_iterations=5):
    print(content)

4. chat_with_tools_and_router()

Multi-provider chat with tools and failover.

async for chunk_type, content, provider in chat_with_tools_and_router(request, router):
    print(f"[{provider}]: {content}")

Convenience Functions

1. quick_chat()

Simple single-provider chat.

response = await quick_chat("Hello!", "openai", tools_enabled=True)

2. quick_chat_with_router()

Simple chat with routing strategies.

response, provider = await quick_chat_with_router(
    "Hello!", 
    ["openai", "google_gemini"],
    strategy=RoutingStrategy.RANDOM
)

Routing Strategies

PRIORITY Strategy (Default)

# Try providers in priority order: openai -> google_gemini -> anthropic
router = create_router(
    ["openai", "google_gemini", "anthropic"],
    strategy=RoutingStrategy.PRIORITY
)

RANDOM Strategy

# Start with random provider, then follow priority order
router = create_router(
    ["openai", "google_gemini", "anthropic"],
    strategy=RoutingStrategy.RANDOM
)

CYCLE Strategy

# Cycle through providers continuously
router = create_router(
    ["openai", "google_gemini", "anthropic"],
    strategy=RoutingStrategy.CYCLE
)

# Use with max_cycles parameter
async for chunk_type, content, provider in stream_with_router(request, router, max_cycles=3):
    print(f"[{provider}]: {content}")

Tool System Integration

Tool Registration

from utils.llm_client import Tool

@Tool
def calculator(expression: str) -> str:
    """Calculate a mathematical expression safely."""
    try:
        result = eval(expression)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

Automatic Tool Detection

# Tools are automatically detected and included
request = LLMRequest(
    messages=[{"role": "user", "content": "What's 5 * 7?"}],
    tools_enabled=True  # Automatically includes available tools
)

Tool Call Execution

# Tool calls are automatically executed and results fed back to LLM
async for chunk_type, content in chat_with_tools(request, "openai"):
    print(content)  # Includes both LLM response and tool results

Provider Support

Supported Providers

OpenAI (GPT models)
Google Gemini (Gemini models)
Anthropic (Claude models)
OpenRouter (Multiple models)

Provider Configuration

# Custom provider configurations
providers = [
    ProviderConfig(
        name="openai",
        priority=0,
        status=ProviderStatus.AVAILABLE,
        max_retries=2,
        backoff_seconds=30
    ),
    ProviderConfig(
        name="google_gemini",
        priority=1,
        status=ProviderStatus.AVAILABLE,
        max_retries=3,
        backoff_seconds=60
    )
]

router = ProviderRouter(providers, RoutingStrategy.PRIORITY)

Error Handling & Resilience

Automatic Failover

If a provider fails, automatically try the next available provider
Configurable retry limits and backoff periods
Status tracking for each provider

Tool Call Error Handling

Graceful handling of malformed tool calls
Error messages returned to LLM for context
Exception handling for tool execution failures

Retry Logic

# Custom retry configuration
provider = ProviderConfig(
    name="openai",
    max_retries=3,
    backoff_seconds=60  # Wait 60 seconds before retry
)

Usage Examples

Basic Usage

from utils.llm_client import quick_chat, quick_chat_with_router, RoutingStrategy

# Simple chat
response = await quick_chat("Hello!", "openai")

# Chat with failover
response, provider = await quick_chat_with_router(
    "Hello!", 
    ["openai", "google_gemini"]
)

Advanced Usage

from utils.llm_client import (
    LLMRequest, create_router, stream_with_router, 
    RoutingStrategy, Tool
)

@Tool
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"Weather in {city}: Sunny, 72°F"

# Create request
request = LLMRequest(
    messages=[{"role": "user", "content": "What's the weather in NYC?"}],
    tools_enabled=True
)

# Create router
router = create_router(
    ["openai", "google_gemini"],
    strategy=RoutingStrategy.RANDOM
)

# Stream with failover
async for chunk_type, content, provider in stream_with_router(request, router):
    if chunk_type == "t":
        print(f"[{provider}]: {content}")

Custom Pipeline

# Create custom pipeline with middleware
async def my_pipeline(request):
    router = create_router(["openai", "google_gemini"])
    
    async for chunk_type, content, provider in stream_with_router(request, router):
        # Custom processing
        processed_content = process_chunk(content)
        yield chunk_type, processed_content, provider

# Use the pipeline
async for chunk_type, content, provider in my_pipeline(request):
    print(f"[{provider}]: {content}")

Benefits

Simplified API: Only 6 core functions instead of many
Automatic Detection: Tools are automatically detected and included
Flexible Routing: Multiple routing strategies for different use cases
Proper Parsing: Correct tool call parsing for different providers
Cycle Support: Can cycle through providers for load balancing
Error Resilience: Automatic failover and retry logic
Functional Composition: Easy to compose and extend
Provider Agnostic: Works with multiple LLM providers
Tool Integration: Seamless tool calling with automatic execution
Performance: Efficient streaming and concurrent tool execution

Migration from OOP Approach

Before (OOP)

client = LLMClient("openai")
client.enable_tools()
async for chunk in client.chat_with_tools(messages):
    print(chunk)

After (Functional)

request = LLMRequest(messages=messages, tools_enabled=True)
async for chunk_type, content in chat_with_tools(request, "openai"):
    if chunk_type == "t":
        print(content)

The functional approach is more flexible, composable, and provides better separation of concerns while maintaining simplicity.

FAQs

What is flux-llm-kai?

Is flux-llm-kai well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

flux-llm-kai

LLM Client - Enhanced Functional Programming Architecture

Architecture Overview