@mostafa.hanafy/prompt-cache

Reliable caching layer for LLM API calls to reduce cost, avoid duplicate requests, and improve latency.
Why
LLM applications often repeat the same requests:
- retries from clients
- repeated user actions
- identical prompts within short time windows
- shared context across workflows
Without caching, you pay for the same request multiple times.
@mostafa.hanafy/prompt-cache helps you:
- avoid duplicate API calls
- reduce token usage cost
- improve response latency
- keep caching logic simple and deterministic
Install
npm install @mostafa.hanafy/prompt-cache
pnpm add @mostafa.hanafy/prompt-cache
bun add @mostafa.hanafy/prompt-cache
Key capabilities
- deterministic cache keys (
key / keyParts)
- TTL-based expiration
- in-memory cache adapter
- in-flight request deduplication (prevents duplicate concurrent calls)
- pluggable cache adapters
- lifecycle hooks
Quick start
import { withPromptCache } from "@mostafa.hanafy/prompt-cache";
const response = await withPromptCache({
keyParts: ["openai", "gpt-4o-mini", prompt, context],
ttlSeconds: 60,
call: () =>
openai.responses.create({
model: "gpt-4o-mini",
input: prompt,
}),
});
Example use case
Avoid duplicate AI calls in high-traffic APIs:
await withPromptCache({
keyParts: ["chat", userId, prompt],
ttlSeconds: 30,
call: () => generateResponse(prompt),
});
If 10 users trigger the same request at the same time:
- without cache → 10 API calls ❌
- with @mostafa.hanafy/prompt-cache → 1 API call ✅
How it works
request
↓
generate key
↓
cache lookup
↓
hit → return cached result
miss → execute call
↓
store result
API
withPromptCache(options)
Wraps an async LLM call with cache lookup, in-flight deduplication, and optional hooks.
await withPromptCache({
key?: string,
keyParts?: unknown[],
ttlSeconds?: number,
cache?: CacheAdapter,
shouldCache?: (value) => boolean,
onHit?: (meta) => void,
onMiss?: (meta) => void,
onSet?: (meta) => void,
onError?: (error, meta) => void,
call: () => Promise<T>,
});
Options:
key: explicit cache key (takes precedence over keyParts)
keyParts: parts that are deterministically hashed into a key
ttlSeconds: cache TTL (default 60)
cache: custom adapter (memoryCache is default)
shouldCache: return false to skip writing a result
onHit / onMiss / onSet / onError: lifecycle hooks
call: async operation to execute on cache miss
createCacheKey(parts)
Creates a deterministic key by stable-stringifying input and hashing it.
Use this when you want to precompute or inspect keys directly.
createMemoryCache() and memoryCache
createMemoryCache(): creates a new isolated in-memory cache adapter
memoryCache: shared default singleton adapter used by withPromptCache
Lifecycle hooks
await withPromptCache({
keyParts: [prompt],
onHit: (meta) => console.log("cache hit", meta),
onMiss: (meta) => console.log("cache miss", meta),
onSet: (meta) => console.log("stored in cache", meta),
onError: (error, meta) => console.error("cache error", error, meta),
call: async () => aiCall(),
});
Key and TTL guidance
- Include all request inputs in
keyParts (model, prompt, context, options).
- In-memory cache is suitable for single-instance apps.
- Use a custom adapter (for example Redis) in distributed deployments.
- Set
ttlSeconds to match freshness requirements.
Testing and verification
bun test
bun run typecheck
bun run lint
bun run build
Performance verification (PRD §10)
Run benchmark harness:
bun bench/overhead.ts
tsx bench/overhead.ts
It reports avg/p50/p95/max latency for:
- cache-hit path
- miss path
- concurrent dedup path
Pass criterion: cache-hit path p95 < 1ms.
The benchmark exits with a non-zero code if this criterion fails.
Limitations
- In-memory cache does not work across multiple instances
- TTL-based invalidation only (no advanced invalidation yet)
- Cache key must include all relevant inputs
Related packages
Part of a small AI developer toolkit:
- token-budget-guard — enforce token budgets for LLM calls
- llm-retry-guard — safe retry wrapper for LLM APIs
- ai-request-logger — structured logging for AI requests
- @mostafa.hanafy/prompt-cache — avoid duplicate LLM calls
License
MIT