šŸš€ Big News:Socket Has Acquired Secure Annex.Learn More →
Socket
Book a DemoSign in
Socket

llm-spend-guard

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

llm-spend-guard

Enforce real-time token budgets and spending limits for OpenAI, Anthropic Claude, and Google Gemini API calls in Node.js

latest
Source
npmnpm
Version
2.0.6
Version published
Weekly downloads
288K
30.7%
Maintainers
1
Weekly downloads
Ā 
Created
Source

version

llm-spend-guard

Stop your LLM API bills from spiraling out of control.
A lightweight Node.js package that enforces real-time token budgets for OpenAI, Anthropic, and Google Gemini API calls.

license node typescript tests codecov downloads

The Problem

A single runaway loop, an uncapped user session, or one oversized prompt can burn through your entire LLM budget in minutes. There is no built-in way to set spending limits across OpenAI, Anthropic, or Gemini SDKs.

llm-spend-guard wraps your existing LLM SDK calls and enforces token budgets before any request is sent to the API. If a request would exceed your budget, it gets blocked instantly — no money wasted.

Why llm-spend-guard?

  • Pre-request blocking — Stops overspending before the API call, not after
  • Multi-provider — Single API for OpenAI, Anthropic Claude, and Google Gemini
  • Multi-scope budgets — Global, per-user, per-session, and per-route limits
  • Zero config — Works with 3 lines of code, no infrastructure needed
  • Production-ready — Redis storage, Express/Next.js middleware, TypeScript-first
  • Lightweight — zero runtime dependencies beyond tiktoken

Table of Contents

How It Works

Your Code --> llm-spend-guard --> LLM API (OpenAI / Anthropic / Gemini)
                  |
                  |-- 1. Estimates tokens BEFORE the request
                  |-- 2. Checks all budget scopes (global, user, session, route)
                  |-- 3. If over budget --> BLOCKS the request (throws BudgetExceededError)
                  |-- 4. If auto-truncate enabled --> trims prompt to fit
                  |-- 5. Sends request to LLM API
                  |-- 6. Records actual token usage from response
                  |-- 7. Fires alert callbacks at 50%, 80%, 100% thresholds

Key principle: The guard sits between your code and the LLM SDK. It estimates cost before sending, blocks if over budget, and tracks actual usage after the response.

Compatible Tech Stacks

CategorySupported
RuntimeNode.js >= 18, Bun, Deno (with Node compat)
LanguageTypeScript, JavaScript (CommonJS and ESM)
LLM ProvidersOpenAI, Anthropic (Claude), Google Gemini
FrameworksExpress.js, Next.js, Fastify, Koa, Hono, NestJS, or any Node.js server
StorageIn-memory (default), Redis, or any custom adapter
Use CasesREST APIs, SaaS backends, chatbots, AI agents, CLI tools, serverless functions

Not compatible with: Browser/frontend code (this is a server-side package), Python, or non-Node runtimes without Node compatibility.

Installation

npm install llm-spend-guard

Then install the provider SDK(s) you use:

# Pick one or more
npm install openai                  # For OpenAI (GPT-4o, GPT-4, etc.)
npm install @anthropic-ai/sdk       # For Anthropic (Claude)
npm install @google/generative-ai   # For Google Gemini

# Optional: Redis storage
npm install ioredis

Quick Start

import { LLMGuard } from 'llm-spend-guard';
import OpenAI from 'openai';

// 1. Create the guard with your budget
const guard = new LLMGuard({
  dailyBudgetTokens: 100_000,       // 100K tokens per day
  maxTokensPerRequest: 10_000,      // No single request can use more than 10K
  onBudgetWarning(level, stats) {
    console.log(`Budget alert [${level}]: ${stats.percentage.toFixed(1)}% used`);
  },
});

// 2. Wrap your existing SDK client
const openai = new OpenAI();
guard.wrapOpenAI(openai);

// 3. Use guard.openai instead of openai directly
const response = await guard.openai.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is the meaning of life?' }],
  max_tokens: 500,
});

console.log(response.choices[0].message.content);

// 4. Check your budget anytime
const remaining = await guard.getRemainingBudget();
console.log(`Tokens remaining today: ${remaining}`);

That's it. If any request would exceed the budget, it throws BudgetExceededError and the API is never called.

Configuration Options

const guard = new LLMGuard({
  // --- Budget Limits ---
  dailyBudgetTokens: 100_000,       // Max tokens per day (resets at midnight)
  globalBudgetTokens: 1_000_000,    // Lifetime global cap
  userBudgetTokens: 10_000,         // Max per user
  sessionBudgetTokens: 5_000,       // Max per session
  maxTokensPerRequest: 10_000,      // Max per single request

  // --- Behavior ---
  autoTruncate: true,               // Auto-trim prompts to fit budget

  // --- Storage ---
  storage: new MemoryStorage(),     // Default. Use RedisStorage for production.

  // --- Monitoring ---
  onBudgetWarning(level, stats) {
    // level: 'warning_50' | 'warning_80' | 'exceeded'
    // stats: { scope, scopeKey, used, limit, remaining, percentage }
  },
});
OptionTypeDefaultDescription
dailyBudgetTokensnumberundefinedMax tokens per day. Auto-resets at midnight.
globalBudgetTokensnumberundefinedLifetime total token cap.
userBudgetTokensnumberundefinedMax tokens per unique user.
sessionBudgetTokensnumberundefinedMax tokens per session.
maxTokensPerRequestnumberundefinedHard cap on a single request.
autoTruncatebooleanfalseAutomatically shorten prompts to fit remaining budget.
storageStorageAdapterMemoryStorageWhere usage data is stored.
onBudgetWarningfunctionundefinedCalled at 50%, 80%, and 100% usage.

Usage By Provider

OpenAI

import { LLMGuard } from 'llm-spend-guard';
import OpenAI from 'openai';

const guard = new LLMGuard({ dailyBudgetTokens: 100_000 });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
guard.wrapOpenAI(openai);

const res = await guard.openai.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
  max_tokens: 500,
});

Anthropic (Claude)

import { LLMGuard } from 'llm-spend-guard';
import Anthropic from '@anthropic-ai/sdk';

const guard = new LLMGuard({ dailyBudgetTokens: 100_000 });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
guard.wrapAnthropic(anthropic);

const res = await guard.anthropic.chat({
  model: 'claude-sonnet-4-20250514',
  messages: [{ role: 'user', content: 'Hello!' }],
  max_tokens: 500,
  system: 'You are a helpful assistant.',  // Anthropic system prompt
});

Google Gemini

import { LLMGuard } from 'llm-spend-guard';
import { GoogleGenerativeAI } from '@google/generative-ai';

const guard = new LLMGuard({ dailyBudgetTokens: 100_000 });
const gemini = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
guard.wrapGemini(gemini);

const res = await guard.gemini.chat({
  model: 'gemini-1.5-pro',
  messages: [{ role: 'user', content: 'Hello!' }],
  max_tokens: 500,
});

Budget Scopes

You can enforce budgets at multiple levels simultaneously:

                    +-------------------+
                    |   Global Budget   |  <-- total across everything
                    +-------------------+
                     /        |         \
            +--------+  +--------+  +--------+
            | User A |  | User B |  | User C |  <-- per-user limit
            +--------+  +--------+  +--------+
               |            |
          +---------+  +---------+
          | Session |  | Session |  <-- per-session limit
          +---------+  +---------+
               |
          +---------+
          |  Route  |  <-- per-route limit
          +---------+

Pass context with every request to activate scopes:

await guard.openai.chat(
  {
    model: 'gpt-4o',
    messages: [...],
    max_tokens: 500,
  },
  {
    userId: 'user-123',       // activates per-user budget
    sessionId: 'sess-abc',    // activates per-session budget
    route: '/api/chat',       // activates per-route budget
  }
);

All applicable scopes are checked. If any scope is exceeded, the request is blocked.

How Guarding Works (Request Lifecycle)

Here is exactly what happens on every .chat() call:

Step 1: ESTIMATE
   |  Count tokens in all messages using tiktoken (OpenAI) or heuristic (others)
   |  Add max_tokens (expected output) to get total estimated cost
   v
Step 2: CHECK BUDGET
   |  For each active scope (global, daily, user, session, route):
   |    - Load current usage from storage
   |    - Compare: estimated tokens vs remaining budget
   |    - If over budget --> throw BudgetExceededError (request NEVER sent)
   v
Step 3: AUTO-TRUNCATE (if enabled)
   |  If prompt is too large but truncation is on:
   |    - Keep system message intact
   |    - Keep most recent messages
   |    - Drop oldest messages first
   |    - Truncate text of last message if still too large
   v
Step 4: SEND REQUEST
   |  Forward to actual LLM API (OpenAI/Anthropic/Gemini)
   v
Step 5: RECORD USAGE
   |  Read actual token counts from API response
   |  Update all scope counters in storage
   v
Step 6: FIRE ALERTS
   |  If any scope crosses 50% --> onBudgetWarning('warning_50', stats)
   |  If any scope crosses 80% --> onBudgetWarning('warning_80', stats)
   |  If any scope crosses 100% --> onBudgetWarning('exceeded', stats)
   v
Step 7: RETURN RESPONSE
      Return the original API response to your code

Viewing Reports and Stats

Get Budget Stats

// Global stats (all scopes)
const stats = await guard.getStats();
console.log(stats);

Output:

[
  {
    "scope": "global",
    "scopeKey": "daily",
    "used": 45200,
    "limit": 100000,
    "remaining": 54800,
    "percentage": 45.2
  }
]

Get Per-User Stats

const userStats = await guard.getStats({ userId: 'user-123' });
console.log(userStats);

Output:

[
  {
    "scope": "global",
    "scopeKey": "daily",
    "used": 45200,
    "limit": 100000,
    "remaining": 54800,
    "percentage": 45.2
  },
  {
    "scope": "user",
    "scopeKey": "user:user-123",
    "used": 8300,
    "limit": 10000,
    "remaining": 1700,
    "percentage": 83.0
  }
]

Get Remaining Token Count

const remaining = await guard.getRemainingBudget({ userId: 'user-123' });
console.log(`Tokens left: ${remaining}`);
// Output: "Tokens left: 1700"

This returns the minimum remaining across all active scopes. If the user has 1700 left on their user budget but 54800 left on the daily budget, it returns 1700 (the tightest constraint).

Build a Usage Dashboard Endpoint

app.get('/api/usage', async (req, res) => {
  const userId = req.headers['x-user-id'] as string;

  const stats = await guard.getStats({ userId });
  const remaining = await guard.getRemainingBudget({ userId });

  res.json({
    budgets: stats.map(s => ({
      scope: s.scope,
      key: s.scopeKey,
      used: s.used,
      limit: s.limit,
      remaining: s.remaining,
      percentUsed: `${s.percentage.toFixed(1)}%`,
    })),
    totalRemaining: remaining,
  });
});

Response:

{
  "budgets": [
    {
      "scope": "global",
      "key": "daily",
      "used": 45200,
      "limit": 100000,
      "remaining": 54800,
      "percentUsed": "45.2%"
    },
    {
      "scope": "user",
      "key": "user:user-123",
      "used": 8300,
      "limit": 10000,
      "remaining": 1700,
      "percentUsed": "83.0%"
    }
  ],
  "totalRemaining": 1700
}

Reset Budgets

// Reset all budgets
await guard.reset();

// Reset for a specific user
await guard.reset({ userId: 'user-123' });

Alert Callbacks (Monitoring)

Get notified as budgets are consumed:

const guard = new LLMGuard({
  dailyBudgetTokens: 100_000,
  userBudgetTokens: 10_000,
  onBudgetWarning(level, stats) {
    switch (level) {
      case 'warning_50':
        console.log(`[WARN] ${stats.scopeKey} is 50% used (${stats.used}/${stats.limit})`);
        break;
      case 'warning_80':
        console.warn(`[CRITICAL] ${stats.scopeKey} is 80% used!`);
        // Send Slack notification, email alert, etc.
        break;
      case 'exceeded':
        console.error(`[EXCEEDED] ${stats.scopeKey} has exceeded the budget!`);
        // Page on-call, disable feature flag, etc.
        break;
    }
  },
});

Alert levels fire once per scope per threshold — you won't get spammed with duplicate alerts.

LevelFires WhenTypical Action
warning_5050% budget consumedLog it, update dashboard
warning_8080% budget consumedAlert team via Slack/email
exceeded100% budget consumedBlock requests, page on-call

What Happens When Budget Is Exceeded

When a request would exceed any budget scope, the guard throws BudgetExceededError:

import { BudgetExceededError } from 'llm-spend-guard';

try {
  await guard.openai.chat({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Tell me everything about the universe' }],
    max_tokens: 50_000,
  });
} catch (err) {
  if (err instanceof BudgetExceededError) {
    console.log(err.message);
    // "Token budget exceeded for global:daily. Used 95000/100000 tokens (95.0%)"

    console.log(err.stats);
    // {
    //   scope: 'global',
    //   scopeKey: 'daily',
    //   used: 95000,
    //   limit: 100000,
    //   remaining: 5000,
    //   percentage: 95.0
    // }
  }
}

The LLM API is NEVER called. No money is spent. The request is blocked locally before it leaves your server.

Auto Truncation

When autoTruncate: true, instead of rejecting oversized prompts, the guard intelligently trims them:

const guard = new LLMGuard({
  dailyBudgetTokens: 5_000,
  autoTruncate: true,  // Enable smart truncation
});

Truncation strategy:

  • System messages are always preserved
  • Most recent messages are kept first
  • Oldest messages are dropped
  • If the last message is still too large, its text is trimmed with ... appended

This is useful for chatbots with long conversation histories — the guard keeps the most relevant context while staying within budget.

Storage Backends

In-Memory (Default)

import { LLMGuard, MemoryStorage } from 'llm-spend-guard';

const guard = new LLMGuard({
  storage: new MemoryStorage(),  // This is the default, no need to specify
  dailyBudgetTokens: 100_000,
});

Good for: single-process apps, development, testing. Limitation: data is lost on restart, not shared across processes.

Redis (Production)

import { LLMGuard, RedisStorage } from 'llm-spend-guard';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

const guard = new LLMGuard({
  storage: new RedisStorage(redis, 'myapp:budget:'),  // optional key prefix
  dailyBudgetTokens: 100_000,
});

Good for: production, multi-instance, serverless. Keys auto-expire at midnight (daily reset built-in).

Custom Adapter

Implement the StorageAdapter interface for any backend (PostgreSQL, DynamoDB, file system, etc.):

import { LLMGuard, StorageAdapter, ScopeUsage } from 'llm-spend-guard';

const myStorage: StorageAdapter = {
  async get(key: string): Promise<ScopeUsage | null> {
    // Read from your database
    return db.get(key);
  },
  async set(key: string, value: ScopeUsage): Promise<void> {
    // Write to your database
    await db.set(key, value);
  },
  async increment(key: string, tokens: number): Promise<ScopeUsage> {
    // Atomically increment and return updated value
    const existing = await this.get(key) ?? { totalTokens: 0, date: new Date().toISOString().slice(0, 10) };
    existing.totalTokens += tokens;
    await this.set(key, existing);
    return existing;
  },
  async reset(key: string): Promise<void> {
    await db.delete(key);
  },
};

const guard = new LLMGuard({ storage: myStorage, dailyBudgetTokens: 100_000 });

Framework Integration

Express.js

import express from 'express';
import OpenAI from 'openai';
import { LLMGuard, expressMiddleware, budgetErrorHandler } from 'llm-spend-guard';

const app = express();
app.use(express.json());

const guard = new LLMGuard({
  dailyBudgetTokens: 500_000,
  userBudgetTokens: 50_000,
  maxTokensPerRequest: 10_000,
  onBudgetWarning(level, stats) {
    console.warn(`[${level}] ${stats.scopeKey}: ${stats.percentage.toFixed(1)}%`);
  },
});

const openai = new OpenAI();
guard.wrapOpenAI(openai);

// Middleware auto-extracts userId, sessionId, route from request
// userId from: x-user-id header or req.user.id (passport)
// sessionId from: x-session-id header or req.sessionID (express-session)
// route from: req.path
app.use(expressMiddleware(guard));

app.post('/api/chat', async (req, res, next) => {
  try {
    const response = await guard.openai.chat(
      {
        model: 'gpt-4o',
        messages: req.body.messages,
        max_tokens: 1000,
      },
      req.llmBudgetContext,  // Automatically populated by middleware
    );
    res.json(response);
  } catch (err) {
    next(err);
  }
});

// Returns HTTP 429 with error details when budget exceeded
app.use(budgetErrorHandler);

app.listen(3000);

When budget is exceeded, the client gets:

HTTP 429 Too Many Requests

{
  "error": "Token budget exceeded",
  "details": {
    "scope": "user",
    "scopeKey": "user:user-123",
    "used": 48500,
    "limit": 50000,
    "remaining": 1500,
    "percentage": 97.0
  }
}

Next.js API Routes

// pages/api/chat.ts (or app/api/chat/route.ts)
import OpenAI from 'openai';
import { LLMGuard, withBudgetGuard } from 'llm-spend-guard';

const guard = new LLMGuard({
  dailyBudgetTokens: 200_000,
  userBudgetTokens: 20_000,
  autoTruncate: true,
});

const openai = new OpenAI();
guard.wrapOpenAI(openai);

async function handler(req: any, res: any) {
  const response = await guard.openai.chat(
    {
      model: 'gpt-4o',
      messages: req.body.messages,
      max_tokens: 1000,
    },
    req.llmBudgetContext,  // Auto-populated by withBudgetGuard
  );
  res.status(200).json(response);
}

// Wraps handler with budget enforcement + auto 429 on exceeded
export default withBudgetGuard(guard, handler);

Fastify / Koa / Hono

No built-in middleware for these, but integration is trivial since the guard is framework-agnostic:

// Fastify example
fastify.post('/api/chat', async (request, reply) => {
  try {
    const response = await guard.openai.chat(
      {
        model: 'gpt-4o',
        messages: request.body.messages,
        max_tokens: 1000,
      },
      {
        userId: request.headers['x-user-id'] as string,
        sessionId: request.headers['x-session-id'] as string,
        route: request.url,
      },
    );
    return response;
  } catch (err) {
    if (err instanceof BudgetExceededError) {
      reply.status(429).send({ error: 'Budget exceeded', details: err.stats });
      return;
    }
    throw err;
  }
});

SaaS Per-User Budget Example

For multi-tenant SaaS apps where each user has their own token budget:

import { LLMGuard, RedisStorage } from 'llm-spend-guard';
import Anthropic from '@anthropic-ai/sdk';
import Redis from 'ioredis';

const guard = new LLMGuard({
  userBudgetTokens: 10_000,          // 10K tokens per user per day
  dailyBudgetTokens: 1_000_000,      // 1M total across all users
  maxTokensPerRequest: 5_000,
  autoTruncate: true,
  storage: new RedisStorage(new Redis()),
  onBudgetWarning(level, stats) {
    if (stats.scope === 'user' && level === 'warning_80') {
      // Notify user they're running low
      notifyUser(stats.scopeKey.replace('user:', ''), {
        message: `You've used ${stats.percentage.toFixed(0)}% of your daily AI quota.`,
        remaining: stats.remaining,
      });
    }
  },
});

const anthropic = new Anthropic();
guard.wrapAnthropic(anthropic);

// In your API handler:
async function handleChat(userId: string, messages: any[]) {
  return guard.anthropic.chat(
    {
      model: 'claude-sonnet-4-20250514',
      messages,
      max_tokens: 1000,
    },
    { userId },
  );
}

Full API Reference

LLMGuard

MethodReturnsDescription
new LLMGuard(config)LLMGuardCreate a guard instance
wrapOpenAI(client)OpenAIProviderWrap an OpenAI SDK client
wrapAnthropic(client)AnthropicProviderWrap an Anthropic SDK client
wrapGemini(client)GeminiProviderWrap a Google Generative AI client
guard.openaiOpenAIProviderAccess the wrapped OpenAI provider
guard.anthropicAnthropicProviderAccess the wrapped Anthropic provider
guard.geminiGeminiProviderAccess the wrapped Gemini provider
getStats(ctx?)Promise<BudgetStats[]>Get usage stats for all applicable scopes
getRemainingBudget(ctx?)Promise<number>Get minimum remaining tokens across scopes
reset(ctx?)Promise<void>Reset usage counters
getBudgetManager()BudgetManagerAccess the underlying budget manager

Provider .chat() Method

All providers (OpenAI, Anthropic, Gemini) have the same interface:

await guard.openai.chat(params, context?)
ParameterTypeDescription
params.modelstringModel name (e.g. 'gpt-4o', 'claude-sonnet-4-20250514')
params.messagesChatMessage[]Array of { role, content } messages
params.max_tokensnumberMax output tokens (default: 4096)
context.userIdstring?User identifier for per-user budgets
context.sessionIdstring?Session identifier for per-session budgets
context.routestring?Route/endpoint for per-route budgets

BudgetStats Object

{
  scope: 'global' | 'user' | 'session' | 'route',
  scopeKey: string,     // e.g. "daily", "user:user-123"
  used: number,         // tokens consumed
  limit: number,        // budget cap
  remaining: number,    // tokens left
  percentage: number    // 0-100+
}

BudgetExceededError

err.message   // Human-readable error string
err.stats     // BudgetStats object with full details
err.name      // 'BudgetExceededError'

Exports

// Core
import { LLMGuard, BudgetManager, BudgetExceededError } from 'llm-spend-guard';

// Providers
import { OpenAIProvider, AnthropicProvider, GeminiProvider } from 'llm-spend-guard';

// Storage
import { MemoryStorage, RedisStorage } from 'llm-spend-guard';

// Middleware
import { expressMiddleware, budgetErrorHandler, withBudgetGuard } from 'llm-spend-guard';

// Utilities
import { estimateTokens, estimateMessagesTokens, truncateMessages } from 'llm-spend-guard';

// Types
import type {
  GuardConfig, BudgetConfig, BudgetStats, BudgetScope,
  AlertLevel, StorageAdapter, ScopeUsage, RequestContext,
  ChatMessage, TokenEstimatorFn,
} from 'llm-spend-guard';

Comparison with Alternatives

Featurellm-spend-guardManual trackingOpenAI Usage Limits
Pre-request blockingYesNoNo (post-hoc only)
Multi-provider supportOpenAI + Claude + GeminiManual per SDKOpenAI only
Per-user budgetsBuilt-inBuild yourselfNo
Per-session / per-route scopesBuilt-inBuild yourselfNo
Auto-truncationYesNoNo
Express/Next.js middlewareBuilt-inBuild yourselfNo
Redis supportBuilt-inBuild yourselfNo
Self-hostedYesYesNo (vendor dashboard)

Running Tests

git clone <repo-url>
cd llm-spend-guard
npm install
npm test

108 tests (99% coverage) covering:

  • Budget overflow and enforcement (global, daily, per-request limits)
  • Per-user, per-session, per-route scopes
  • Token estimation accuracy (tiktoken + heuristic)
  • Context truncation logic (system messages, binary search trimming)
  • All provider wrappers — OpenAI, Anthropic, Gemini (mocked, no API keys needed)
  • Auto-truncation across all providers
  • Alert callback firing and deduplication
  • Guard lifecycle (create, wrap, reset)
  • Express middleware and Next.js wrapper
  • Error handling (BudgetExceededError, budget error handler)
  • Storage backends (MemoryStorage, RedisStorage with mock)

Contributing

We welcome contributions! Please read the Contributing Guide before submitting a PR.

Look for issues labeled good first issue to get started.

Security

To report vulnerabilities, please see our Security Policy.

Support

If this package helps you, consider supporting its development:

GitHub Sponsors Buy Me a Coffee

Contributors

Contributors

License

MIT — Made by Ali Raza

Keywords

llm

FAQs

Package last updated on 04 Apr 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts