Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@mostafa.hanafy/prompt-cache

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@mostafa.hanafy/prompt-cache

Lightweight caching layer for LLM API calls with deterministic keys and in-flight deduplication.

latest
npmnpm
Version
0.1.0
Version published
Weekly downloads
2
-60%
Maintainers
1
Weekly downloads
 
Created
Source

@mostafa.hanafy/prompt-cache

npm npm downloads license

Reliable caching layer for LLM API calls to reduce cost, avoid duplicate requests, and improve latency.

Why

LLM applications often repeat the same requests:

  • retries from clients
  • repeated user actions
  • identical prompts within short time windows
  • shared context across workflows

Without caching, you pay for the same request multiple times.

@mostafa.hanafy/prompt-cache helps you:

  • avoid duplicate API calls
  • reduce token usage cost
  • improve response latency
  • keep caching logic simple and deterministic

Install

npm install @mostafa.hanafy/prompt-cache
# or
pnpm add @mostafa.hanafy/prompt-cache
# or
bun add @mostafa.hanafy/prompt-cache

Key capabilities

  • deterministic cache keys (key / keyParts)
  • TTL-based expiration
  • in-memory cache adapter
  • in-flight request deduplication (prevents duplicate concurrent calls)
  • pluggable cache adapters
  • lifecycle hooks

Quick start

import { withPromptCache } from "@mostafa.hanafy/prompt-cache";

const response = await withPromptCache({
  keyParts: ["openai", "gpt-4o-mini", prompt, context],
  ttlSeconds: 60,
  call: () =>
    openai.responses.create({
      model: "gpt-4o-mini",
      input: prompt,
    }),
});

Example use case

Avoid duplicate AI calls in high-traffic APIs:

await withPromptCache({
  keyParts: ["chat", userId, prompt],
  ttlSeconds: 30,
  call: () => generateResponse(prompt),
});

If 10 users trigger the same request at the same time:

  • without cache → 10 API calls ❌
  • with @mostafa.hanafy/prompt-cache → 1 API call ✅

How it works

request
  ↓
generate key
  ↓
cache lookup
  ↓
hit  → return cached result
miss → execute call
  ↓
store result

API

withPromptCache(options)

Wraps an async LLM call with cache lookup, in-flight deduplication, and optional hooks.

await withPromptCache({
  key?: string,
  keyParts?: unknown[],
  ttlSeconds?: number,
  cache?: CacheAdapter,
  shouldCache?: (value) => boolean,
  onHit?: (meta) => void,
  onMiss?: (meta) => void,
  onSet?: (meta) => void,
  onError?: (error, meta) => void,
  call: () => Promise<T>,
});

Options:

  • key: explicit cache key (takes precedence over keyParts)
  • keyParts: parts that are deterministically hashed into a key
  • ttlSeconds: cache TTL (default 60)
  • cache: custom adapter (memoryCache is default)
  • shouldCache: return false to skip writing a result
  • onHit / onMiss / onSet / onError: lifecycle hooks
  • call: async operation to execute on cache miss

createCacheKey(parts)

Creates a deterministic key by stable-stringifying input and hashing it.

Use this when you want to precompute or inspect keys directly.

createMemoryCache() and memoryCache

  • createMemoryCache(): creates a new isolated in-memory cache adapter
  • memoryCache: shared default singleton adapter used by withPromptCache

Lifecycle hooks

await withPromptCache({
  keyParts: [prompt],
  onHit: (meta) => console.log("cache hit", meta),
  onMiss: (meta) => console.log("cache miss", meta),
  onSet: (meta) => console.log("stored in cache", meta),
  onError: (error, meta) => console.error("cache error", error, meta),
  call: async () => aiCall(),
});

Key and TTL guidance

  • Include all request inputs in keyParts (model, prompt, context, options).
  • In-memory cache is suitable for single-instance apps.
  • Use a custom adapter (for example Redis) in distributed deployments.
  • Set ttlSeconds to match freshness requirements.

Testing and verification

bun test
bun run typecheck
bun run lint
bun run build

Performance verification (PRD §10)

Run benchmark harness:

# preferred
bun bench/overhead.ts

# fallback
tsx bench/overhead.ts

It reports avg/p50/p95/max latency for:

  • cache-hit path
  • miss path
  • concurrent dedup path

Pass criterion: cache-hit path p95 < 1ms. The benchmark exits with a non-zero code if this criterion fails.

Limitations

  • In-memory cache does not work across multiple instances
  • TTL-based invalidation only (no advanced invalidation yet)
  • Cache key must include all relevant inputs

Part of a small AI developer toolkit:

  • token-budget-guard — enforce token budgets for LLM calls
  • llm-retry-guard — safe retry wrapper for LLM APIs
  • ai-request-logger — structured logging for AI requests
  • @mostafa.hanafy/prompt-cache — avoid duplicate LLM calls

License

MIT

Keywords

ai

FAQs

Package last updated on 20 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts