glim-llm
Lightweight multi-provider LLM wrapper for Node/Edge (ESM + TypeScript).
Features:
- OpenAI, Groq (OpenAI-compatible), Gemini (Google) initial support
- Unified
generate API
- Optional caching (LRU in-memory)
- Rate limiting (concurrency + window request limiting)
- Automatic retries with exponential backoff (configurable / disable)
- Prompt sanitization helper (pluggable)
- Simple streaming interface placeholder (upgrade later to real streaming)
- ESM + CommonJS builds, type declarations
- Zero heavy dependencies; only focused utilities
Install
npm install glim-llm
npm install openai @google/generative-ai
Quick Start
import { createLLMClient, SUPPORTED_PROVIDERS } from 'glim-llm';
const openaiClient = createLLMClient({
provider: 'openai',
config: { apiKey: process.env.OPENAI_KEY!, model: 'gpt-4o-mini' },
rateLimit: { concurrency: 2, requestsPerInterval: 60, intervalMs: 60_000 },
cache: { ttlMs: 300_000, max: 1000 },
retry: { retries: 3 },
sanitize: true,
});
const result = await openaiClient.generate({ prompt: 'Explain edge computing in 2 sentences.' });
console.log(result.text);
console.log('Providers available:', SUPPORTED_PROVIDERS);
API
createLLMClient(options)
Returns an object with:
name (provider)
generate(params) Promise
stream(params) AsyncGenerator (currently one-shot; future: true streaming)
Options
provider: 'openai' | 'groq' | 'gemini'
config: { apiKey, model, maxOutputTokens?, temperature?, extra? }
rateLimit (optional): { concurrency?, requestsPerInterval?, intervalMs?, throwOnLimit? }
cache (optional | false): { ttlMs?, max?, namespace? }
retry (optional | false): { retries?, factor?, minTimeoutMs?, maxTimeoutMs? }
sanitize (boolean | function) enable basic prompt cleaning.
Generate Params
prompt (string)
systemPrompt?
model? override
temperature?
maxOutputTokens?
streaming? (future use)
signal? AbortSignal (future wiring)
cacheKey? custom or false to bypass cache
Caching
LRU in-memory; suits single runtime instance. External cache (Redis, KV) can wrap by replacing ResponseCache logic (PRs welcome).
Rate Limiting
Two layers: concurrency (parallel tasks) and requestsPerInterval within a sliding window intervalMs. Set throwOnLimit: true to error instead of waiting.
Retries
Uses exponential backoff. Disable with retry: false.
Sanitization
Naive removal of control chars and prompt injection phrases; override with custom function.
Edge / Serverless
All dependencies are ESM-friendly. Avoid Node-specific APIs if targeting strict edge (replace crypto hash with Web Crypto; TODO: auto-detect later).
Roadmap
- True streaming per provider
- Tool / function calling abstraction
- Token usage normalization
- Pluggable logging hooks
- Web Crypto fallback
- Middleware pipeline
License
MIT