Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

@mostafa.hanafy/prompt-cache

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@mostafa.hanafy/prompt-cache

Lightweight caching layer for LLM API calls with deterministic keys and in-flight deduplication.

latest

npm

Version: 0.1.0

Version published: 2 months ago

Weekly downloads: 2

Maintainers: 1

Weekly downloads

Created: 2 months ago

Source

@mostafa.hanafy/prompt-cache

npm npm downloads license

Reliable caching layer for LLM API calls to reduce cost, avoid duplicate requests, and improve latency.

Why

LLM applications often repeat the same requests:

retries from clients
repeated user actions
identical prompts within short time windows
shared context across workflows

Without caching, you pay for the same request multiple times.

@mostafa.hanafy/prompt-cache helps you:

avoid duplicate API calls
reduce token usage cost
improve response latency
keep caching logic simple and deterministic

Install

npm install @mostafa.hanafy/prompt-cache
# or
pnpm add @mostafa.hanafy/prompt-cache
# or
bun add @mostafa.hanafy/prompt-cache

Key capabilities

deterministic cache keys (key / keyParts)
TTL-based expiration
in-memory cache adapter
in-flight request deduplication (prevents duplicate concurrent calls)
pluggable cache adapters
lifecycle hooks

Quick start

import { withPromptCache } from "@mostafa.hanafy/prompt-cache";

const response = await withPromptCache({
  keyParts: ["openai", "gpt-4o-mini", prompt, context],
  ttlSeconds: 60,
  call: () =>
    openai.responses.create({
      model: "gpt-4o-mini",
      input: prompt,
    }),
});

Example use case

Avoid duplicate AI calls in high-traffic APIs:

await withPromptCache({
  keyParts: ["chat", userId, prompt],
  ttlSeconds: 30,
  call: () => generateResponse(prompt),
});

If 10 users trigger the same request at the same time:

without cache → 10 API calls ❌
with @mostafa.hanafy/prompt-cache → 1 API call ✅

How it works

request
  ↓
generate key
  ↓
cache lookup
  ↓
hit  → return cached result
miss → execute call
  ↓
store result

API

`withPromptCache(options)`

Wraps an async LLM call with cache lookup, in-flight deduplication, and optional hooks.

await withPromptCache({
  key?: string,
  keyParts?: unknown[],
  ttlSeconds?: number,
  cache?: CacheAdapter,
  shouldCache?: (value) => boolean,
  onHit?: (meta) => void,
  onMiss?: (meta) => void,
  onSet?: (meta) => void,
  onError?: (error, meta) => void,
  call: () => Promise<T>,
});

Options:

key: explicit cache key (takes precedence over keyParts)
keyParts: parts that are deterministically hashed into a key
ttlSeconds: cache TTL (default 60)
cache: custom adapter (memoryCache is default)
shouldCache: return false to skip writing a result
onHit / onMiss / onSet / onError: lifecycle hooks
call: async operation to execute on cache miss

`createCacheKey(parts)`

Creates a deterministic key by stable-stringifying input and hashing it.

Use this when you want to precompute or inspect keys directly.

`createMemoryCache()` and `memoryCache`

createMemoryCache(): creates a new isolated in-memory cache adapter
memoryCache: shared default singleton adapter used by withPromptCache

Lifecycle hooks

await withPromptCache({
  keyParts: [prompt],
  onHit: (meta) => console.log("cache hit", meta),
  onMiss: (meta) => console.log("cache miss", meta),
  onSet: (meta) => console.log("stored in cache", meta),
  onError: (error, meta) => console.error("cache error", error, meta),
  call: async () => aiCall(),
});

Key and TTL guidance

Include all request inputs in keyParts (model, prompt, context, options).
In-memory cache is suitable for single-instance apps.
Use a custom adapter (for example Redis) in distributed deployments.
Set ttlSeconds to match freshness requirements.

Testing and verification

bun test
bun run typecheck
bun run lint
bun run build

Performance verification (PRD §10)

Run benchmark harness:

# preferred
bun bench/overhead.ts

# fallback
tsx bench/overhead.ts

It reports avg/p50/p95/max latency for:

cache-hit path
miss path
concurrent dedup path

Pass criterion: cache-hit path p95 < 1ms. The benchmark exits with a non-zero code if this criterion fails.

Limitations

In-memory cache does not work across multiple instances
TTL-based invalidation only (no advanced invalidation yet)
Cache key must include all relevant inputs

Part of a small AI developer toolkit:

token-budget-guard — enforce token budgets for LLM calls
llm-retry-guard — safe retry wrapper for LLM APIs
ai-request-logger — structured logging for AI requests
@mostafa.hanafy/prompt-cache — avoid duplicate LLM calls

License

MIT

Keywords

FAQs

What is @mostafa.hanafy/prompt-cache?

Is @mostafa.hanafy/prompt-cache popular?

Is @mostafa.hanafy/prompt-cache well maintained?

Package last updated on 20 Mar 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@mostafa.hanafy/prompt-cache

@mostafa.hanafy/prompt-cache

Why

Install

Key capabilities

Quick start

Example use case

How it works

API

withPromptCache(options)

createCacheKey(parts)

createMemoryCache() and memoryCache

Lifecycle hooks

Key and TTL guidance

Testing and verification

Performance verification (PRD §10)

Limitations

Related packages

License

Keywords

Related posts

Feross on TBPN: Socket's Series C and the State of Software Supply Chain Security

OSV Withdraws 157 Malware Reports After Automated False Positives Hit npm and PyPI

`withPromptCache(options)`

`createCacheKey(parts)`

`createMemoryCache()` and `memoryCache`