
Security News
pnpm 11.5 Adds Support for Recognizing npm Staged Publishes
pnpm 11.5 now recognizes npm staged publish approvals in release metadata, preventing those releases from being mistaken for lower-trust package publishes.
tokenometer
Advanced tools
Tokenometer CLI — LLM token cost + latency benchmarking across Claude, GPT-4o, Gemini, Mistral, and Cohere. Multi-format, empirical mode, vision tokens, SARIF output.
Empirical token-cost + latency benchmarking for LLM prompts. Tells you what your prompt actually costs and how fast each provider responds across Claude, GPT-4o, Gemini, Mistral, and Cohere — in every format.
See the root README for findings, methodology, and the full project overview.
Live playground: tokenometer.dev · Source · MIT
npx tokenometer ./prompt.md --model claude-opus-4-7,gpt-4o
model format tokens est. cost tokenizer
--------------- -------- ------ --------- --------------
claude-opus-4-7 json ~78 $0.001170 cl100k_base
claude-opus-4-7 yaml ~84 $0.001260 cl100k_base
gpt-4o json 77 $0.000192 o200k_base
gpt-4o yaml 83 $0.000208 o200k_base
Cheapest: gpt-4o as json ($0.000192)
Priciest: claude-opus-4-7 as yaml ($0.001260, 6.74x more)
A leading ~ marks an approximate count (offline mode for Claude / Gemini / Mistral-Tekken / Cohere, since none of those vendors publishes a public production tokenizer that ships in JS).
| Flag | Default | Notes |
|---|---|---|
--model <id[,id…]> | claude-opus-4-7 (or auto-detected) | Any registered model id (63 across 5 providers). |
--format <fmt[,fmt…]> | json,yaml,xml,markdown,text | Subset of supported formats. |
--output <fmt> | table | table | json | sarif. |
--by-file | off | Append a per-file token/USD table (multi-file only). |
--image <path> | none | Add vision-token cost for the image (repeatable). |
--config <path> | none | Load this exact config file (skips walk-up). |
--no-config | off | Skip .tokenometer.yml loading entirely. |
--empirical | off | Use provider countTokens APIs (free, exact). |
--latency | off | Measure real generation latency (TTFT, total ms, tokens/sec). Implies --empirical. |
--latency-trials <n> | 3 | Trials per cell when --latency is set (1–10). |
--max-spend <usd> | 0.05 (or 0.25 with --latency) | Hard ceiling for empirical / latency mode. |
--offline | off | Force offline path (overrides --empirical). |
-h, --help | Print help. | |
-v, --version | Print version. |
tokenometer <file> [options]
echo "prompt" | tokenometer - [options]
63 models across 5 providers. Run tokenometer --help for the full list at runtime, or browse the Cost Atlas for sortable per-model pages.
| Provider | Examples | Offline tokenizer | Empirical |
|---|---|---|---|
| Anthropic | claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5, Claude 3.x family | gpt-tokenizer cl100k_base (approximate) | messages.countTokens (free, exact) |
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1 family | gpt-tokenizer o200k_base (exact) | same o200k_base (matches production) |
gemini-2.5-pro, gemini-2.5-flash, gemini-1.5-pro, gemini-1.5-flash | chars / 4 (approximate) | model.countTokens (free, exact) | |
| Mistral (19 models) | open-mistral-7b, open-mixtral-8x22b, mistral-large-latest, codestral-latest, mistral-nemo, pixtral-large-latest, mistral-medium-2505, magistral-small, ministral-3b-latest, devstral-small-2505 | mistral-tokenizer-js for SentencePiece V1/V2/V3 (exact); chars/4 for Tekken (approximate) | unsupported (no public token-count API) |
| Cohere | command-r-08-2024, command-r-plus-08-2024 | chars / 4 (approximate) | POST /v1/tokenize (free, exact, requires COHERE_API_KEY) |
Pricing comes from the tokenlens registry with a small set of local overrides for bleeding-edge models. Cohere pricing lives entirely in LOCAL_OVERRIDES because @tokenlens/models doesn't yet ship a Cohere catalog at v1.3.0.
For exact, vendor-billed counts on Claude, Gemini, and Cohere, set the right env var and pass --empirical. The tool calls each provider's free countTokens-equivalent endpoint — no charge.
ANTHROPIC_API_KEY=… GOOGLE_API_KEY=… COHERE_API_KEY=… \
npx tokenometer ./prompt.md --empirical --model claude-opus-4-7,gemini-2.5-pro,command-r-plus-08-2024
OpenAI's empirical path uses tiktoken o200k_base locally — that encoding matches OpenAI's production count exactly, so no API call is needed. Mistral has no public token-count endpoint; the offline mistral-tokenizer-js path is used regardless.
When --model is omitted, tokenometer picks a default based on which provider key is set in your environment:
ANTHROPIC_API_KEY only → claude-opus-4-7OPENAI_API_KEY only → gpt-4oGOOGLE_API_KEY / GEMINI_API_KEY only → first known gemini-* model (falls back to gemini-2.5-pro)MISTRAL_API_KEY only → first known mistral-* modelCOHERE_API_KEY only → command-r-plus-08-2024claude-opus-4-7 and prints a stderr note. Pass --model to disambiguate.claude-opus-4-7).This means npx tokenometer prompt.md does the right thing in any of those environments without you having to remember model names.
.tokenometer.yml configDrop a .tokenometer.yml (or .yaml) at the project root and tokenometer will pick it up automatically (walks up from the cwd, stopping at .git):
models: [claude-opus-4-7, gpt-4o, mistral-large-latest]
formats: [json, yaml, markdown]
paths: [prompts/**/*.md]
budgets:
total: 0.50
per-file: 0.10
User-passed CLI flags always win over config defaults. Use --config <path> to load an explicit file (skips the walk-up). Use --no-config to skip config loading entirely.
The --output flag picks the display format (separate from --format, which controls how the prompt body is converted before tokenization):
--output table (default) — the human-readable per-cell table you've been seeing.--output json — emits a TokenometerResult JSON shape: { files: [{ path, results: [...] }] }. One entry per input file. Pipe to jq for filtering.--output sarif — emits SARIF 2.1.0 with one result per (file, model, format) cell. Drop the file into GitHub Code Scanning or any SARIF viewer.npx tokenometer ./prompt.md --output sarif > tokenometer.sarif
npx tokenometer ./prompt.md --output json | jq '.files[].results | map(.inputCost) | add'
--latency measures real generation latency in addition to token cost. For each (model, format) cell, tokenometer streams n real chat completions (default n=3, override with --latency-trials 1..10) capped at max_tokens=200, and reports:
output_tokens / (total - ttft)Numbers are reported as p50 / p95 / mean over the trials. Full per-trial data is included in --output json.
ANTHROPIC_API_KEY=… OPENAI_API_KEY=… \
npx tokenometer ./prompt.md --latency --model claude-opus-4-7,gpt-4o
--latency implies --empirical (offline mode can't measure real latency). The default --max-spend ceiling is bumped from $0.05 to $0.25 to cover the n × 200-token generations; pass --max-spend explicitly to override.
Supported providers: Anthropic (messages.stream), OpenAI (/v1/chat/completions SSE), Google (generateContentStream), Cohere (/v1/chat NDJSON), Mistral (/v1/chat/completions SSE). Each trial retries once on transient failures.
--by-file appends a per-file token + USD summary table when you pass multiple input files (single-file inputs are a no-op):
By file:
File Tokens USD
──────────────── ─────── ───────
prompts/agent.md 1,243 $0.0186
prompts/router.md 872 $0.0131
Useful for figuring out which prompt files dominate the cost of a multi-file pipeline. The aggregator that produces this table is also what powers the GitHub Action's per-file Δ comment, and is unit-tested in packages/action.
Pass --image <path> (repeatable) to factor image-based vision tokens into the cost estimate alongside your prompt text:
npx tokenometer ./prompt.md --image ./screenshot.png --image ./diagram.jpg
Each image's dimensions are read with image-size (no native deps), then dispatched to the provider-specific vision-token estimator:
(width × height) / 750, capped at 1600 tokens.85 + 170 × ceil(w/512) × ceil(h/512) after the 2048/768 resize step.258 × ceil(w/768) × ceil(h/768) (with a flat 258 for ≤384×384 images).Mistral and Cohere don't have published vision-token formulas, so vision images are skipped for those providers (with a stderr note). Vision-token cells are always marked approximate: true since they're formula-derived. Each image also gets its own row in the --by-file table as a virtual file <image-path> [vision].
tiktoken?tiktoken's cl100k_base (the encoding most "Claude tokenizer" libraries fall back on) under-counts Opus 4.7 by a median of +62% across a 10-prompt benchmark. Sonnet 4.6 and Haiku 4.5 are closer (~17%). Format choice is a wash. Model choice swings cost by 12×. See README for the dataset findings.
MIT
FAQs
Tokenometer CLI — LLM token cost + latency benchmarking across Claude, GPT-4o, Gemini, Mistral, and Cohere. Multi-format, empirical mode, vision tokens, SARIF output.
We found that tokenometer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
pnpm 11.5 now recognizes npm staged publish approvals in release metadata, preventing those releases from being mistaken for lower-trust package publishes.

Security News
Federal audit finds NIST lacked a plan to clear the NVD backlog, wasted funds on duplicate work, and delayed use of CISA data.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.