
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
@amplitude/ai
Advanced tools
Agent analytics for Amplitude. Track every LLM call, user message, tool call, and quality signal as events in your Amplitude project — then build funnels, cohorts, and retention charts across AI and product behavior.
npm install @amplitude/ai @amplitude/analytics-node
import { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ amplitude: ai, apiKey: process.env.OPENAI_API_KEY });
const agent = ai.agent('my-agent');
app.post('/chat', async (req, res) => {
const session = agent.session({ userId: req.userId, sessionId: req.sessionId });
const result = await session.run(async (s) => {
s.trackUserMessage(req.body.message);
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: req.body.messages,
});
return response.choices[0].message.content;
});
await ai.flush();
res.json({ response: result });
});
// Events: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency),
// [Agent] Session Start, [Agent] Session End — all tied to userId and sessionId
npm install @amplitude/ai
npx amplitude-ai
The CLI prints a prompt to paste into any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.):
Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md
The agent reads the guide, scans your project, discovers your agents and LLM call sites, and instruments everything — provider wrappers, session lifecycle, multi-agent delegation, tool tracking, scoring, and a verification test. You review and approve each step.
Whether you use a coding agent or set up manually, the goal is the same: full instrumentation — agents + sessions + provider wrappers. This gives you every event type, per-user analytics, and server-side enrichment.
Follow the code example above to get started. The pattern is:
import { OpenAI } from '@amplitude/ai' (or Anthropic, Gemini, etc.)ai.agent('my-agent') to name and track your AI componentagent.session({ userId, sessionId }).run(async (s) => { ... }) for per-user analytics, funnels, cohorts, and server-side enrichments.trackUserMessage(...) for conversation contexts.score(...) for quality measurement
patch()exists for quick verification or legacy codebases where you can't modify call sites, but it only captures[Agent] AI Responsewithout user identity — no funnels, no cohorts, no retention. Start with full instrumentation; fall back topatch()only if you can't modify call sites.
| Property | Value |
|---|---|
| Name | @amplitude/ai |
| Version | 0.3.9 |
| Runtime | Node.js |
| Peer dependency | @amplitude/analytics-node >= 1.3.0 |
| Optional peers | openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime, @pydantic/genai-prices (cost), tiktoken or js-tiktoken (token counting) |
npm install @amplitude/ai @amplitude/analytics-node
Install provider SDKs based on what you use (for example: openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime).
npm install @amplitude/ai @amplitude/analytics-nodenpx amplitude-ai and paste the printed prompt into your AI coding agent. Or follow the manual setup steps — the goal is the same: agents + sessions + provider wrappers..env file and replace the placeholder userId/sessionId.[Agent] User Message, [Agent] AI Response, and [Agent] Session End within 30 seconds.To verify locally before checking Amplitude, add debug: true:
const ai = new AmplitudeAI({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
config: new AIConfig({ debug: true }),
});
// Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203ms
Tip: Call
enableLivePriceUpdates()at startup so cost tracking stays accurate when new models are released. See Cache-Aware Cost Tracking.
| Area | Status |
|---|---|
| Runtime | Node.js only (no browser). Python SDK available separately (amplitude-ai on PyPI). |
| Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral, Bedrock (Converse/ConverseStream only). |
| CrewAI | Python-only; the Node.js export throws ProviderError by design. Use LangChain or OpenTelemetry integrations instead. |
| OTEL scope filtering | Not yet supported (Python SDK has allowed_scopes/blocked_scopes). |
| Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts required for other providers' streamed responses. |
Yes, if you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.
Already using an LLM observability tool? Keep it. The OTEL bridge adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.
Most AI observability tools give you traces. This SDK gives you per-turn events that live in your product analytics so you can:
The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces one event per conversation turn with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.
Every AI event carries your product user_id. No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude.
Server-side enrichment does the evals for you. When content is available (contentMode: 'full'), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as [Agent] Score events with rubric scores, [Agent] Topic Classification events with category labels, and [Agent] Session Evaluation summaries, all queryable in charts, cohorts, and funnels alongside your product events.
Quality signals from every source in one event type. User thumbs up/down (source: 'user'), automated rubric scores from the enrichment pipeline (source: 'ai'), and reviewer assessments (source: 'reviewer') all produce [Agent] Score events differentiated by [Agent] Evaluation Source. One chart shows all three side by side. Filter by source or view them together. Filter by [Agent] Agent ID for per-agent quality attribution.
Three content-control tiers. full sends content and Amplitude runs enrichments for you. metadata_only sends zero content (you still get cost, latency, tokens, session grouping). customer_enriched sends zero content but lets you provide your own structured labels via trackSessionEnrichment().
Cache-aware cost tracking. Pass cacheReadTokens and cacheCreationTokens for accurate blended costs. Without this breakdown, naive cost calculation can overestimate by 2-5x for cache-heavy workloads.
Once AI events are in Amplitude alongside your product events:
[Agent] Overall Outcome or task completion score.The SDK captures quality signals at three layers, from most direct to most comprehensive:
1. Explicit user feedback — Instrument thumbs up/down, star ratings, or CSAT scores via trackScore(). Each call produces an [Agent] Score event with source: 'user':
ai.trackScore({
userId: 'u1', name: 'user-feedback', value: 1,
targetId: aiMessageId, targetType: 'message', source: 'user',
});
2. Implicit behavioral signals — The SDK auto-tracks behavioral proxies for quality on every turn, with zero additional instrumentation:
| Signal | Property | Event | Interpretation |
|---|---|---|---|
| Copy | [Agent] Was Copied | [Agent] AI Response | User copied the output — positive |
| Regeneration | [Agent] Is Regeneration | [Agent] User Message | User asked for a redo — negative |
| Edit | [Agent] Is Edit | [Agent] User Message | User refined their prompt — friction |
| Abandonment | [Agent] Abandonment Turn | [Agent] Session End | User left after N turns — potential failure |
3. Automated server-side evaluation — When contentMode: 'full', Amplitude's enrichment pipeline runs LLM-as-judge evaluators on every session after it closes. No eval code to write or maintain:
| Rubric | What it measures | Scale |
|---|---|---|
task_completion | Did the agent accomplish what the user asked? | 0–2 |
response_quality | Was the response clear, accurate, and helpful? | 0–2 |
user_satisfaction | Did the user seem satisfied based on conversation signals? | 0–2 |
agent_confusion | Did the agent misunderstand or go off track? | 0–2 |
Plus boolean detectors: negative_feedback (frustration phrases), task_failure (agent failed to deliver), data_quality_issues, and behavioral_patterns (clarification loops, topic drift). All results are emitted as [Agent] Score events with source: 'ai'.
All three layers use the same [Agent] Score event type, differentiated by [Agent] Evaluation Source ('user', 'ai', or 'reviewer'). One chart shows user feedback alongside automated evals. No joins, no separate tables.
| You set | Where it comes from | What you unlock |
|---|---|---|
| API key | Amplitude project settings | Events reach Amplitude |
| userId | Your auth layer (JWT, session cookie, API token) | Per-user analytics, cohorts, retention |
| agentId | Your choice (e.g. 'chat-handler') | Per-agent cost, latency, quality dashboards |
| sessionId | Your conversation/thread/ticket ID | Multi-turn analysis, session enrichment, quality scores |
| description | Your choice (e.g. 'Handles support queries via GPT-4o') | Human-readable agent registry from event streams |
| contentMode + redactPii | Config (defaults work) | Server enrichment (automatic), PII scrubbing |
| model, tokens, cost | Auto-captured by provider wrappers | Cost analytics, latency monitoring |
| parentAgentId | Auto via child()/runAs() | Multi-agent hierarchy |
| env, agentVersion, context | Your deploy pipeline | Segmentation, regression detection |
Italicized rows require zero developer effort — they're automatic or have sensible defaults.
The minimum viable setup is 4 fields: API key, userId, agentId, sessionId. Everything else is either automatic or a progressive enhancement.
The coding agent workflow defaults to full instrumentation — the top row below. Lower levels exist as fallbacks, not as recommended starting points.
| Level | Events you get | What it unlocks in Amplitude |
|---|---|---|
| Full (agents + sessions + wrappers) | User Message, AI Response, Tool Call, Session Start/End, Score, Enrichments | Per-user funnels, cohorts, retention, session replay linking, quality scoring |
| Wrappers only (no sessions) | AI Response (with cost, tokens, latency) | Aggregate cost monitoring, model comparison |
patch() only (no wrappers, no sessions) | AI Response (basic) | Aggregate call counts — useful for verification only |
patch() is best-effort by installed SDK and provider surface; OpenAI Agents tracing depends on incoming span payload shape from the host SDK.AmplitudeCrewAIHooks is Python-only and throws in Node.js.This section is the source of truth for behavior that is intentionally different from Python due to runtime constraints:
AmplitudeCrewAIHooks is unsupported in Node.js (CrewAI is Python-only).tool() does not auto-generate JSON Schema from runtime type hints; pass inputSchema explicitly.Promise.race based and cannot preempt synchronous CPU-bound code.node --import in Node vs sitecustomize in Python).patch() monkey-patches provider SDKs so existing LLM calls are tracked without code changes. This is useful for verifying the SDK works or for legacy codebases where you can't modify call sites. It only captures [Agent] AI Response without user identity — for the full event model, use agents + sessions (see Quick Start).
import { AmplitudeAI, patch } from '@amplitude/ai';
// OpenAI/Azure OpenAI chat completions (+ parse), OpenAI Responses, Anthropic, Gemini, Mistral,
// and Bedrock Converse calls are tracked when patching succeeds.
// No changes to your existing code needed.
import OpenAI from 'openai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
patch({ amplitudeAI: ai });
const openai = new OpenAI();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
// ^ automatically tracked as [Agent] AI Response
Warning: Patched calls that fire outside an active session context are silently dropped — no event is emitted and no error is thrown. If you instrument with
patch()but see no events, this is the most likely cause. Wrap your LLM calls insession.run(), use the Express middleware, or pass context explicitly. See Session and Middleware.
Or use the CLI to auto-patch at process start without touching application code:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js
Replace the provider constructor with the Amplitude-instrumented version for automatic tracking with full control over options per call:
import { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
amplitude: ai,
apiKey: process.env.OPENAI_API_KEY,
});
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();
await session.run(async () => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
// AI response tracked automatically via wrapper
const responseV2 = await openai.responses.create({
model: 'gpt-4.1',
instructions: 'You are concise.',
input: [{ role: 'user', content: 'Summarize this in one sentence.' }],
});
// OpenAI Responses API is also tracked automatically
});
Or wrap an existing client instance (supports OpenAI, Azure OpenAI, and Anthropic):
import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);
All provider constructors and wrap() accept either an AmplitudeAI instance or a raw Amplitude client — both work:
new OpenAI({ amplitude: ai }); // AmplitudeAI instance
new OpenAI({ amplitude: ai.amplitude }); // raw Amplitude client
wrap(client, ai); // AmplitudeAI instance
wrap(client, ai.amplitude); // raw Amplitude client
Note:
wrap()only supports OpenAI, Azure OpenAI, and Anthropic clients. For Gemini, Mistral, and Bedrock, use the SDK's provider classes directly (e.g.,new Gemini({ amplitude: ai })).
Call tracking methods directly for maximum flexibility. Works with any LLM provider, including custom or self-hosted models:
import { AmplitudeAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
s.trackUserMessage('Summarize this document');
const start = performance.now();
const response = await myCustomLLM.generate('Summarize this document');
const latencyMs = performance.now() - start;
s.trackAiMessage(response.text, 'my-model-v2', 'custom', latencyMs, {
inputTokens: response.usage.input,
outputTokens: response.usage.output,
});
});
Main client that wraps Amplitude analytics-node. Create it with an API key or an existing Amplitude instance:
const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY' });
// Or with existing client:
const ai = new AmplitudeAI({ amplitude: existingAmplitudeClient });
Agent with pre-bound defaults (agentId, description, userId, env, etc.). Use agent() to create:
const agent = ai.agent('support-bot', {
description: 'Handles customer support queries via OpenAI GPT-4o',
userId: 'user-123',
env: 'production',
customerOrgId: 'org-456',
});
Child agents inherit context from their parent and automatically set parentAgentId (note: description is agent-specific and is not inherited — pass it explicitly if needed):
const orchestrator = ai.agent('orchestrator', {
description: 'Routes queries to specialized child agents',
userId: 'user-123',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer', {
description: 'Drafts responses using retrieved context',
});
// researcher.parentAgentId === 'orchestrator'
// researcher inherits orchestrator's description; writer has its own
Multi-tenant helper that pre-binds customerOrgId for all agents created from it:
const tenant = ai.tenant('org-456', { env: 'production' });
const agent = tenant.agent('support-bot', { userId: 'user-123' });
User identity flows through the session, per-call, or middleware -- not at agent creation or patch time. This keeps the agent reusable across users.
Via sessions (recommended): pass userId when opening a session:
const agent = ai.agent('support-bot', { env: 'production' });
const session = agent.session({ userId: 'user-42' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
// userId inherited from session context
});
Per-call: pass userId on each tracking call (useful with the zero-code tier):
agent.trackUserMessage('Hello', {
userId: 'user-42',
sessionId: 'sess-1',
});
Via middleware: createAmplitudeAIMiddleware extracts user identity from the request (see Middleware):
app.use(
createAmplitudeAIMiddleware({
amplitudeAI: ai,
userIdResolver: (req) => req.headers['x-user-id'] ?? null,
}),
);
Async context manager using AsyncLocalStorage. Use session.run() to execute a callback within session context; session end is tracked automatically on exit:
const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
s.trackAiMessage(response.content, 'gpt-4', 'openai', latencyMs);
});
Start a new trace within an ongoing session to group related operations:
await session.run(async (s) => {
const traceId = s.newTrace();
s.trackUserMessage('Follow-up question');
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs);
});
For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass idleTimeoutMinutes so Amplitude knows the session is still active:
const session = agent.session({
userId: 'user-123',
idleTimeoutMinutes: 240, // expect up to 4-hour gaps
});
Without this, sessions with long idle periods may be closed and enrichment may run earlier than expected. The default is 30 minutes.
Session lifecycle and enrichment. You do not need to call trackSessionEnd() for sessions to work. Amplitude's server automatically closes sessions after 30 minutes of inactivity and queues them for enrichment (topic classification, quality scoring, session evaluation) at that point. The only reason to call trackSessionEnd() is to trigger enrichment sooner — for example, if you know the conversation is over and want evaluation results immediately rather than waiting for the idle timeout.
"Closed" is a server-side concept meaning "queued for enrichment" — it does not prevent new events from flowing into the same session. If the user resumes a conversation after session end, new messages with the same sessionId are still associated with that session.
If you use session.run(), session end is tracked automatically when the callback completes. For long-lived conversations (chatbots, support agents), you can skip explicit session end entirely and let the server handle it.
Link to Session Replay: If your frontend uses Amplitude's Session Replay, pass the browser's deviceId and browserSessionId to link AI sessions to browser recordings:
const session = agent.session({
userId: 'user-123',
deviceId: req.headers['x-amp-device-id'],
browserSessionId: req.headers['x-amp-session-id'],
});
await session.run(async (s) => {
s.trackUserMessage('What is retention?');
// All events now carry [Amplitude] Session Replay ID = deviceId/browserSessionId
});
Higher-order function wrapping functions to auto-track as [Agent] Tool Call events:
import { tool } from '@amplitude/ai';
const searchDb = tool(
async (query: { q: string }) => {
return await db.search(query.q);
},
{
name: 'search_db',
inputSchema: { type: 'object', properties: { q: { type: 'string' } } },
},
);
Note on inputSchema: Unlike the Python SDK which accepts a Pydantic model class and extracts the JSON Schema automatically, the TypeScript SDK accepts a raw JSON Schema object. For type-safe schema generation, consider using Zod with zod-to-json-schema:
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const QuerySchema = z.object({ q: z.string(), limit: z.number().optional() });
const searchDb = tool(mySearchFn, {
name: 'search_db',
inputSchema: zodToJsonSchema(QuerySchema),
});
Higher-order function wrapping functions to auto-track as [Agent] Span events:
import { observe } from '@amplitude/ai';
const processRequest = observe(
async (input: Request) => {
return await handleRequest(input);
},
{ name: 'process_request' },
);
import { AIConfig, AmplitudeAI, ContentMode } from '@amplitude/ai';
const config = new AIConfig({
contentMode: ContentMode.FULL, // FULL | METADATA_ONLY | CUSTOMER_ENRICHED — both ContentMode.FULL and 'full' work
redactPii: true,
customRedactionPatterns: ['sensitive-\\d+'],
debug: false,
dryRun: false,
});
const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY', config });
| Option | Description |
|---|---|
contentMode | 'full' (default), 'metadata_only', or 'customer_enriched'. Both ContentMode.FULL and 'full' work. |
redactPii | Redact email, phone, SSN, credit card patterns |
customRedactionPatterns | Additional regex patterns for redaction |
debug | Log events to stderr |
dryRun | Log without sending to Amplitude |
validate | Enable strict validation of required fields |
onEventCallback | Callback invoked after every tracked event (event, statusCode, message) => void |
propagateContext | Enable cross-service context propagation |
The context parameter on ai.agent() accepts an arbitrary Record<string, unknown> that is JSON-serialized and attached to every event as [Agent] Context. This is the recommended way to add segmentation dimensions without requiring new global properties.
Recommended keys:
| Key | Example Values | Use Case |
|---|---|---|
agent_type | "planner", "executor", "retriever", "router" | Filter/group analytics by agent role in multi-agent systems. |
experiment_variant | "control", "treatment-v2", "prompt-rewrite-a" | Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. |
feature_flag | "new-rag-pipeline", "reasoning-model-enabled" | Track which feature flags were active during the session. |
surface | "chat", "search", "copilot", "email-draft" | Identify which UI surface or product area triggered the AI interaction. |
prompt_revision | "v7", "abc123", "2026-02-15" | Track which prompt version was used. Detect prompt regression when combined with agentVersion. |
deployment_region | "us-east-1", "eu-west-1" | Segment by deployment region for latency analysis or compliance tracking. |
canary_group | "canary", "stable" | Identify canary vs. stable deployments for progressive rollout monitoring. |
Example:
const agent = ai.agent('support-bot', {
userId: 'u1',
description: 'Handles customer support queries via OpenAI GPT-4o',
agentVersion: '4.2.0',
context: {
agent_type: 'executor',
experiment_variant: 'reasoning-enabled',
surface: 'chat',
feature_flag: 'new-rag-pipeline',
prompt_revision: 'v7',
},
});
// All events from this agent (and its sessions, child agents, and provider
// wrappers) will include [Agent] Context with these keys.
Context merging in child agents:
const parent = ai.agent('orchestrator', {
context: { experiment_variant: 'treatment', surface: 'chat' },
});
const child = parent.child('researcher', {
context: { agent_type: 'retriever' },
});
// child context = { experiment_variant: 'treatment', surface: 'chat', agent_type: 'retriever' }
// Child keys override parent keys; parent keys absent from the child are preserved.
Querying in Amplitude: The [Agent] Context property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by [Agent] Context.agent_type to see metrics by agent role.
Note on
experiment_variantand server-generated events: Context keys appear on all SDK-emitted events ([Agent] User Message,[Agent] AI Response, etc.). Server-generated events ([Agent] Session Evaluation,[Agent] Scorewithsource="ai") do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from[Agent] Contexton SDK events.
Three content modes control what data is sent to Amplitude:
| Mode | Message Content | Token/Cost/Latency | Session Grouping | Server Enrichments |
|---|---|---|---|---|
FULL | Sent (with PII redaction) | Yes | Yes | Yes (auto) |
METADATA_ONLY | Not sent | Yes | Yes | No |
CUSTOMER_ENRICHED | Not sent | Yes | Yes | Yes (you provide) |
Message content is captured and sent to Amplitude. When you opt in with redactPii: true, built-in PII redaction patterns scrub emails, phone numbers, SSNs, credit card numbers, and base64 image data before the event leaves your process:
const config = new AIConfig({
contentMode: ContentMode.FULL,
redactPii: true,
});
With redactPii: true, a message like "Contact me at john@example.com or 555-123-4567" is sanitized to "Contact me at [email] or [phone]" before being sent.
Built-in phone and SSN detection are currently tuned for common US formats. If you need broader international coverage, add explicit customRedactionPatterns for your locales.
Add custom redaction patterns for domain-specific PII:
const config = new AIConfig({
contentMode: ContentMode.FULL,
redactPii: true,
customRedactionPatterns: ['ACCT-\\d{6,}', 'internal-key-[a-f0-9]+'],
});
Custom redaction patterns are your responsibility: avoid expensive or catastrophic regexes in performance-sensitive paths.
Message content is stored at full length with no truncation or size limits. The $llm_message property is whitelisted server-side, and the Node SDK does not apply per-property string truncation.
No message content is sent. You still get token counts, cost, latency, model name, and session grouping — everything needed for cost analytics and performance monitoring:
const config = new AIConfig({
contentMode: ContentMode.METADATA_ONLY,
});
Use this when you cannot send user content to a third-party analytics service (e.g., regulated industries, sensitive data).
Like METADATA_ONLY (no content sent), but designed for workflows where you enrich sessions with your own classifications, quality scores, and topic labels via the SessionEnrichments API:
const config = new AIConfig({
contentMode: ContentMode.CUSTOMER_ENRICHED,
});
// Later, after running your own classification pipeline:
const enrichments = new SessionEnrichments({
qualityScore: 0.85,
overallOutcome: 'resolved',
});
session.setEnrichments(enrichments);
PrivacyConfig is derived from AIConfig via config.toPrivacyConfig(). For advanced use, create directly:
import { PrivacyConfig } from '@amplitude/ai';
const privacy = new PrivacyConfig({
privacyMode: true,
redactPii: true,
customRedactionPatterns: ['sensitive-\\d+'],
});
When using provider prompt caching (Anthropic's cache, OpenAI's cached completions, etc.), pass cache token breakdowns for accurate cost calculation:
s.trackAiMessage(
response.content,
'claude-3.5-sonnet',
'anthropic',
latencyMs,
{
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
cacheReadTokens: response.usage.cache_read_input_tokens,
cacheCreationTokens: response.usage.cache_creation_input_tokens,
},
);
Without cache breakdowns, cost calculation treats all input tokens at the standard rate. With caching enabled, cache-read tokens are typically 10x cheaper than standard input tokens and cache-creation tokens are ~25% more expensive. Naive cost calculation without this breakdown can overestimate costs by 2-5x for cache-heavy workloads.
The SDK tracks four token categories:
[Agent] Input Tokens — standard (non-cached) input tokens[Agent] Output Tokens — generated output tokens[Agent] Cache Read Tokens — tokens read from provider cache (cheap)[Agent] Cache Creation Tokens — tokens written to provider cache (slightly expensive)Cost is auto-calculated when token counts are provided and the @pydantic/genai-prices package is installed. When genai-prices is not available, calculateCost() returns 0 (never null). You can also pass totalCostUsd directly if you compute cost yourself:
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
totalCostUsd: 0.0034,
});
Note — pricing data freshness. Cost calculation relies on pricing data bundled in the installed
@pydantic/genai-pricespackage. Newly released models may return$0until the package is updated. To get the latest pricing between package releases, opt in to live updates at startup:import { enableLivePriceUpdates } from '@amplitude/ai'; enableLivePriceUpdates(); // fetches latest prices from genai-prices GitHub repo hourlyThis makes periodic HTTPS requests to
raw.githubusercontent.com(~26 KB each). Only enable in environments where outbound network access is permitted.
Track full-response semantic cache hits (distinct from token-level prompt caching above):
s.trackAiMessage(cachedResponse.content, 'gpt-4o', 'openai', latencyMs, {
wasCached: true, // served from Redis/semantic cache
});
Maps to [Agent] Was Cached. Enables "cache hit rate" charts and cost optimization analysis. Only emitted when true; omitted (not false) when the response was not cached.
Models are automatically classified into tiers for cost/performance analysis:
| Tier | Examples | When to Use |
|---|---|---|
fast | gpt-4o-mini, claude-3-haiku, gemini-flash, gpt-3.5-turbo | High-volume, latency-sensitive |
standard | gpt-4o, claude-3.5-sonnet, gemini-pro, llama, command | General purpose |
reasoning | o1, o3-mini, deepseek-r1, claude with extended thinking | Complex reasoning tasks |
The tier is inferred automatically from the model name and attached as [Agent] Model Tier on every [Agent] AI Response event:
import {
inferModelTier,
TIER_FAST,
TIER_REASONING,
TIER_STANDARD,
} from '@amplitude/ai';
inferModelTier('gpt-4o-mini'); // 'fast'
inferModelTier('claude-3.5-sonnet'); // 'standard'
inferModelTier('o1-preview'); // 'reasoning'
Override the auto-inferred tier for custom or fine-tuned models:
s.trackAiMessage(
response.content,
'ft:gpt-4o:my-org:custom',
'openai',
latencyMs,
{
modelTier: 'standard',
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
},
);
Use instrumented provider wrappers for automatic tracking:
| Provider | Class | Package |
|---|---|---|
| OpenAI | OpenAI | openai |
| Anthropic | Anthropic | @anthropic-ai/sdk |
| Gemini | Gemini | @google/generative-ai |
| AzureOpenAI | AzureOpenAI | openai |
| Bedrock | Bedrock | @aws-sdk/client-bedrock-runtime |
| Mistral | Mistral | @mistralai/mistralai |
Feature coverage by provider:
| Feature | OpenAI | Anthropic | Gemini | AzureOpenAI | Bedrock | Mistral |
|---|---|---|---|---|---|---|
| Streaming | Yes | Yes | Yes | Yes | Yes | Yes |
| Tool call tracking | Yes | Yes | No | Yes | Yes | No |
| TTFB measurement | Yes | Yes | No | Yes | No | No |
| Cache token stats | Yes | Yes | No | No | No | No |
| Responses API | Yes | - | - | - | - | - |
| Reasoning content | Yes | Yes | No | Yes | No | No |
| System prompt capture | Yes | Yes | Yes | Yes | Yes | Yes |
| Cost estimation | Yes | Yes | Yes | Yes | Yes | Yes |
Provider wrappers use injected TrackFn callbacks instead of class hierarchy casts, enabling easier composition and custom tracking logic.
Bedrock model IDs like us.anthropic.claude-3-5-sonnet are automatically normalized for price lookup (e.g., to claude-3-5-sonnet).
OpenAI example:
import { AmplitudeAI, OpenAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
amplitude: ai,
apiKey: process.env.OPENAI_API_KEY,
});
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();
await session.run(async (s) => {
const resp = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }],
});
// AI response tracked automatically via wrapper
});
Or wrap an existing client:
import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);
Provider wrappers (OpenAI, AzureOpenAI, Anthropic, Gemini, Mistral, Bedrock) automatically detect supported streaming responses and track them transparently. The wrapper intercepts the AsyncIterable, accumulates chunks, measures TTFB, and emits an [Agent] AI Response event after the stream is fully consumed:
const openai = new OpenAI({ amplitude: ai, apiKey: '...' });
// Streaming is handled automatically — just iterate the result
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// ^ AI Response event emitted automatically after loop ends
Track streaming responses manually with time-to-first-byte (TTFB) for latency analysis:
s.trackAiMessage(fullContent, 'gpt-4o', 'openai', totalMs, {
isStreaming: true,
ttfbMs: timeToFirstByte,
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
});
The SDK tracks two timing properties for streaming:
[Agent] Latency Ms — total wall-clock time from request to final chunk[Agent] TTFB Ms — time-to-first-byte, the delay before the first token arrivesFor manual streaming, use StreamingAccumulator to collect chunks and automatically measure TTFB:
import { StreamingAccumulator } from '@amplitude/ai';
const accumulator = new StreamingAccumulator();
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
accumulator.addContent(content);
}
}
accumulator.setUsage({
inputTokens: finalUsage.prompt_tokens,
outputTokens: finalUsage.completion_tokens,
});
s.trackAiMessage(
accumulator.content,
'gpt-4o',
'openai',
accumulator.elapsedMs,
{
isStreaming: true,
ttfbMs: accumulator.ttfbMs,
inputTokens: accumulator.inputTokens,
outputTokens: accumulator.outputTokens,
finishReason: accumulator.finishReason,
},
);
The accumulator automatically records TTFB when addContent() is called for the first time, and tracks total elapsed time via elapsedMs. For streaming errors, call setError(message) to set isError and errorMessage, which are included on the tracked AI Response event.
Track files sent with user messages (images, PDFs, URLs):
s.trackUserMessage('Analyze this document', {
attachments: [
{ type: 'image', name: 'chart.png', size_bytes: 102400 },
{ type: 'pdf', name: 'report.pdf', size_bytes: 2048576 },
],
});
The SDK automatically derives aggregate properties from the attachment array:
[Agent] Has Attachments — boolean, true when attachments are present[Agent] Attachment Count — number of attachments[Agent] Attachment Types — deduplicated list of attachment types (e.g., ["image", "pdf"])[Agent] Total Attachment Size Bytes — sum of all size_bytes values[Agent] Attachments — serialized JSON of the full attachment metadataAttachments can also be tracked on AI responses (e.g., when the model generates images or files):
s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
attachments: [{ type: 'image', name: 'generated.png', size_bytes: 204800 }],
});
Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:
// User asks a question
s.trackUserMessage('How do I create a funnel?');
// AI responds — user copies the answer (positive signal)
s.trackAiMessage('To create a funnel, go to...', 'gpt-4o', 'openai', latencyMs, {
wasCopied: true,
});
// User regenerates (negative signal — first response wasn't good enough)
s.trackUserMessage('How do I create a funnel?', {
isRegeneration: true,
});
// User edits their question (refining intent)
s.trackUserMessage('How do I create a conversion funnel for signups?', {
isEdit: true,
editedMessageId: originalMsgId, // links the edit to the original
});
Track abandonment at session end — a low abandonmentTurn (e.g., 1) strongly signals first-response dissatisfaction:
agent.trackSessionEnd({
sessionId: 'sess-1',
abandonmentTurn: 1, // user left after first AI response
});
These signals map to [Agent] Was Copied, [Agent] Is Regeneration, [Agent] Is Edit, [Agent] Edited Message ID, and [Agent] Abandonment Turn. Use them in Amplitude to build quality dashboards without requiring user surveys.
Wraps an async function to track as [Agent] Tool Call:
import { tool, ToolCallTracker } from '@amplitude/ai';
ToolCallTracker.setAmplitude(ai.amplitude, 'user-123', {
sessionId: 'sess-1',
traceId: 'trace-1',
agentId: 'my-agent',
privacyConfig: ai.config.toPrivacyConfig(),
});
const fetchWeather = tool(
async (args: { city: string }) => {
return await weatherApi.get(args.city);
},
{
name: 'fetch_weather',
inputSchema: { type: 'object', properties: { city: { type: 'string' } } },
timeoutMs: 5000,
onError: (err, name) => console.error(`Tool ${name} failed:`, err),
},
);
Wraps a function to track as [Agent] Span:
import { observe } from '@amplitude/ai';
const enrichData = observe(async (data: unknown) => transform(data), {
name: 'enrich_data',
agentId: 'enricher',
});
Track quality feedback from multiple sources using the score() method. Scores are emitted as [Agent] Score events.
s.score('thumbs-up', 1, messageId, { source: 'user' });
s.score('thumbs-down', 0, messageId, { source: 'user' });
s.score('rating', 4, messageId, {
source: 'user',
comment: 'Very helpful but slightly verbose',
});
s.score('quality', 0.85, messageId, {
source: 'ai',
comment: 'Clear and accurate response with proper citations',
});
Score an entire session rather than a single message by setting targetType to 'session':
s.score('session-quality', 0.9, session.sessionId, {
targetType: 'session',
source: 'ai',
});
Each [Agent] Score event includes:
[Agent] Score Name — the name you provide (e.g., "thumbs-up", "quality")[Agent] Score Value — numeric value[Agent] Target ID — the message ID or session ID being scored[Agent] Target Type — "message" (default) or "session"[Agent] Evaluation Source — "user" (default) or "ai"[Agent] Comment — optional free-text comment (respects content mode)Attach structured metadata to sessions for analytics. Enrichments are included when the session auto-ends:
import {
RubricScore,
SessionEnrichments,
TopicClassification,
} from '@amplitude/ai';
const enrichments = new SessionEnrichments({
qualityScore: 0.85,
sentimentScore: 0.7,
overallOutcome: 'resolved',
topicClassifications: {
intent: new TopicClassification({
l1: 'billing',
primary: 'billing',
values: ['billing', 'refund'],
subcategories: ['REFUND_REQUEST', 'PRICING_QUESTION'],
}),
},
rubrics: [
new RubricScore({
name: 'helpfulness',
score: 4,
rationale: 'Provided clear step-by-step instructions',
}),
new RubricScore({
name: 'accuracy',
score: 5,
rationale: 'All information was factually correct',
}),
],
agentChain: ['orchestrator', 'researcher', 'writer'],
rootAgentName: 'orchestrator',
requestComplexity: 'medium',
});
session.setEnrichments(enrichments);
// Enrichments are included automatically when session.run() completes
Send enrichments as a standalone event without ending the session:
agent.trackSessionEnrichment(enrichments, {
sessionId: 'sess-abc123',
});
customer_enriched ModeThis mode is for teams that run their own evaluation pipeline (or can't send message content to Amplitude) but still want rich session-level analytics. Here's a complete workflow:
import {
AIConfig,
AmplitudeAI,
ContentMode,
MessageLabel,
RubricScore,
SessionEnrichments,
TopicClassification,
} from '@amplitude/ai';
// 1. Configure: no content sent to Amplitude
const ai = new AmplitudeAI({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
config: new AIConfig({
contentMode: ContentMode.CUSTOMER_ENRICHED,
}),
});
const agent = ai.agent('support-bot', {
description: 'Handles support conversations in metadata-only mode',
agentVersion: '2.1.0',
});
// 2. Run the conversation — content is NOT sent (metadata only)
const session = agent.session({ userId: 'user-42' });
const { sessionId, messageIds } = await session.run(async (s) => {
const msgIds: string[] = [];
msgIds.push(s.trackUserMessage('Why was I charged twice?'));
msgIds.push(
s.trackAiMessage(
aiResponse.content,
'gpt-4o',
'openai',
latencyMs,
),
);
return { sessionId: s.sessionId, messageIds: msgIds };
});
// 3. Run your eval pipeline on the raw messages (e.g., your own LLM judge)
const evalResults = await myEvalPipeline(conversationHistory);
// 4. Ship enrichments back to Amplitude
const enrichments = new SessionEnrichments({
qualityScore: evalResults.quality,
sentimentScore: evalResults.sentiment,
overallOutcome: evalResults.outcome,
topicClassifications: {
'billing': new TopicClassification({
topic: 'billing-dispute',
confidence: 0.92,
}),
},
rubricScores: [
new RubricScore({ name: 'accuracy', score: 4, maxScore: 5 }),
new RubricScore({ name: 'helpfulness', score: 5, maxScore: 5 }),
],
messageLabels: {
[messageIds[0]]: [
new MessageLabel({ key: 'intent', value: 'billing-dispute', confidence: 0.94 }),
],
},
customMetadata: { eval_model: 'gpt-4o-judge-v2' },
});
agent.trackSessionEnrichment(enrichments, { sessionId });
This produces the same Amplitude event properties as Amplitude's built-in server-side enrichment (topics, rubrics, outcomes, message labels), but sourced from your pipeline. Use it when compliance requires zero-content transmission, or when you need custom evaluation logic beyond what the built-in enrichment provides.
qualityScore, sentimentScoreoverallOutcome, hasTaskFailure, taskFailureType, taskFailureReasontopicClassifications — a map of taxonomy name to TopicClassificationrubrics — array of RubricScore with name, score, rationale, and evidencehasNegativeFeedback, hasDataQualityIssues, hasTechnicalFailureerrorCategories, technicalErrorCountbehavioralPatterns, negativeFeedbackPhrases, dataQualityIssuesagentChain, rootAgentNamerequestComplexitymessageLabels — per-message labels keyed by message IDcustomMetadata — arbitrary key/value data for your own analyticsAttach classification labels to individual messages within a session. Labels are flexible key-value pairs for filtering and segmentation in Amplitude.
Common use cases: routing tags (flow, surface), classifier output (intent, sentiment, toxicity), business context (tier, plan).
Inline labels (at tracking time):
import { MessageLabel } from '@amplitude/ai';
s.trackUserMessage('I want to cancel my subscription', {
labels: [
new MessageLabel({
key: 'intent',
value: 'cancellation',
confidence: 0.95,
}),
new MessageLabel({
key: 'sentiment',
value: 'frustrated',
confidence: 0.8,
}),
],
});
Retrospective labels (after the session, from a background pipeline):
When classifier results arrive after the session ends, attach them via SessionEnrichments.messageLabels, keyed by the messageId returned from tracking calls:
import { MessageLabel, SessionEnrichments } from '@amplitude/ai';
const enrichments = new SessionEnrichments({
messageLabels: {
[userMsgId]: [
new MessageLabel({ key: 'intent', value: 'cancellation', confidence: 0.94 }),
],
[aiMsgId]: [
new MessageLabel({ key: 'quality', value: 'good', confidence: 0.91 }),
],
},
});
agent.trackSessionEnrichment(enrichments, { sessionId: 'sess-abc123' });
Labels are emitted as [Agent] Message Labels on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".
Prints a colored (ANSI) summary of every tracked event to stderr. All 8 event types (User Message, AI Response, Tool Call, Embedding, Span, Session End, Session Enrichment, Score) are formatted. Events are still sent to Amplitude:
const ai = new AmplitudeAI({
apiKey: 'xxx',
config: new AIConfig({ debug: true }),
});
// stderr output for each event:
// [amplitude-ai] [Agent] AI Response | user=user-123 session=sess-abc agent=my-agent model=gpt-4o latency=1203ms tokens=150→847 cost=$0.0042
// [amplitude-ai] [Agent] Tool Call | user=user-123 session=sess-abc agent=my-agent tool=search_db success=true latency=340ms
// [amplitude-ai] [Agent] User Message | user=user-123 session=sess-abc agent=my-agent
Logs the full event JSON to stderr WITHOUT sending to Amplitude. Events are never transmitted:
const ai = new AmplitudeAI({
apiKey: 'xxx',
config: new AIConfig({ dryRun: true }),
});
// stderr: full JSON of each event
// Useful for local development, CI pipelines, and validating event shape
Both modes can be enabled via environment variables when using auto-instrumentation:
AMPLITUDE_AI_DEBUG=true amplitude-ai-instrument node app.js
Monkey-patch provider SDKs to auto-track without changing call sites. This is useful for quick verification that the SDK is connected, or for legacy codebases where modifying call sites is impractical. For the full event model (user messages, sessions, scoring, enrichments), use agents + sessions as shown in Quick Start.
import {
AmplitudeAI,
patch,
patchOpenAI,
unpatch,
unpatchOpenAI,
} from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
// Patch installed/available providers (OpenAI, Anthropic, Gemini, Mistral, Bedrock)
patch({ amplitudeAI: ai });
// Or patch specific provider
patchOpenAI({ amplitudeAI: ai });
// Unpatch
unpatch();
unpatchOpenAI();
Available patch functions: patchOpenAI, patchAnthropic, patchAzureOpenAI, patchGemini, patchMistral, patchBedrock. Corresponding unpatch for each: unpatchOpenAI, unpatchAnthropic, unpatchAzureOpenAI, unpatchGemini, unpatchMistral, unpatchBedrock.
patch() returns a string[] of providers where at least one supported surface was successfully patched (e.g., ['openai', 'anthropic']), matching the Python SDK's return signature.
Patch surface notes:
chat.completions.create, chat.completions.parse, and Responses APIs are instrumented (including streaming shapes where exposed by the SDK).ConverseCommand and ConverseStreamCommand are instrumented when patching client.send.Preload the register module to auto-patch providers at process start:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js
Or directly with Node's ESM preload flag:
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true node --import @amplitude/ai/register app.js
Environment variables:
| Variable | Description |
|---|---|
AMPLITUDE_AI_API_KEY | Required for auto-patch |
AMPLITUDE_AI_AUTO_PATCH | Must be "true" to enable |
AMPLITUDE_AI_CONTENT_MODE | full, metadata_only, or customer_enriched |
AMPLITUDE_AI_DEBUG | "true" for debug output to stderr |
Validate setup (env, provider deps, mock event capture, mock flush path):
amplitude-ai doctor
Useful flags:
amplitude-ai doctor --no-mock-checkShow the installed SDK version, detected provider packages, and environment variable configuration at a glance:
amplitude-ai status
Enable tab-completion for all CLI commands and flags:
# bash
eval "$(amplitude-ai-completions bash)"
# zsh
eval "$(amplitude-ai-completions zsh)"
Run the SDK-local MCP server over stdio:
amplitude-ai mcp
MCP surface:
| Tool | Description |
|---|---|
scan_project | Scan project structure, detect providers, frameworks, and multi-agent patterns |
validate_file | Analyze a source file to detect uninstrumented LLM call sites |
instrument_file | Apply instrumentation transforms to a source file |
generate_verify_test | Generate a dry-run verification test using MockAmplitudeAI |
get_event_schema | Return the full event schema and property definitions |
get_integration_pattern | Return canonical instrumentation code patterns |
validate_setup | Check env vars and dependency presence |
suggest_instrumentation | Context-aware next steps based on your framework and provider |
search_docs | Full-text search across SDK documentation (README, llms-full.txt) |
Resources: amplitude-ai://event-schema, amplitude-ai://integration-patterns, amplitude-ai://instrument-guide
Prompt: instrument_app — guided walkthrough for instrumenting an application
amplitude-ai.md — self-contained instrumentation guide for any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.). Run npx amplitude-ai to see the prompt that points your agent to this file.examples/zero-code.tsexamples/wrap-openai.tsexamples/multi-agent.tsexamples/framework-integration.tsexamples/real-openai.ts — end-to-end OpenAI integration with session tracking and flushexamples/real-anthropic.ts — end-to-end Anthropic integration with session tracking and flushimport { AmplitudeAI, AmplitudeCallbackHandler } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const handler = new AmplitudeCallbackHandler({
amplitudeAI: ai,
userId: 'user-123',
sessionId: 'sess-1',
});
// Pass handler to LangChain callbacks
Two exporters add Amplitude as a destination alongside your existing trace backend (Datadog, Honeycomb, Jaeger, etc.):
import {
AmplitudeAgentExporter,
AmplitudeGenAIExporter,
} from '@amplitude/ai';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import {
BatchSpanProcessor,
SimpleSpanProcessor,
} from '@opentelemetry/sdk-trace-base';
const provider = new NodeTracerProvider();
// GenAI exporter — converts gen_ai.* spans into Amplitude AI events
provider.addSpanProcessor(
new BatchSpanProcessor(
new AmplitudeGenAIExporter({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
}),
),
);
// Agent exporter — converts agent.* spans into Amplitude session events
provider.addSpanProcessor(
new SimpleSpanProcessor(
new AmplitudeAgentExporter({
apiKey: process.env.AMPLITUDE_AI_API_KEY!,
}),
),
);
provider.register();
Only spans with gen_ai.provider.name or gen_ai.system attributes are processed; all other spans are silently ignored. This means it's safe to add the exporter to a pipeline that produces mixed (GenAI + HTTP + DB) spans.
Attribute mapping reference:
| OTEL Span Attribute | Amplitude Event Property | Notes |
|---|---|---|
gen_ai.response.model / gen_ai.request.model | [Agent] Model | Response model preferred |
gen_ai.system / gen_ai.provider.name | [Agent] Provider | |
gen_ai.usage.input_tokens | [Agent] Input Tokens | |
gen_ai.usage.output_tokens | [Agent] Output Tokens | |
gen_ai.usage.total_tokens | [Agent] Total Tokens | Derived if not present |
gen_ai.usage.cache_read.input_tokens | [Agent] Cache Read Tokens | |
gen_ai.usage.cache_creation.input_tokens | [Agent] Cache Creation Tokens | |
gen_ai.request.temperature | [Agent] Temperature | |
gen_ai.request.top_p | [Agent] Top P | |
gen_ai.request.max_output_tokens | [Agent] Max Output Tokens | |
gen_ai.response.finish_reasons | [Agent] Finish Reason | |
gen_ai.input.messages | [Agent] LLM Message | Only if content mode allows |
| Span duration | [Agent] Latency Ms | |
| Span status ERROR | [Agent] Is Error, [Agent] Error Message |
Not available via OTEL (use native wrappers): reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id).
When to use OTEL vs. native wrappers: If you already have @opentelemetry/instrumentation-openai or similar producing GenAI spans, the OTEL bridge gives you Amplitude analytics with zero code changes. For richer tracking (implicit feedback, streaming metrics, attachments), use the native wrapOpenAI()/wrapAnthropic() wrappers alongside OTEL.
import {
AmplitudeLlamaIndexHandler,
createAmplitudeLlamaIndexHandler,
} from '@amplitude/ai';
import { AmplitudeTracingProcessor } from '@amplitude/ai';
import { AmplitudeToolLoop } from '@amplitude/ai';
import { AmplitudeCrewAIHooks } from '@amplitude/ai';
In Node.js, AmplitudeCrewAIHooks throws a ProviderError by design. Use LangChain or OpenTelemetry integrations instead.
How events flow from your application to Amplitude charts:
Your Application
├── wrapOpenAI() / wrapAnthropic() ─── auto-emits ──┐
├── session.trackUserMessage() ─── manual ──────┤
├── session.trackAiMessage() ─── manual ──────┤
├── agent.trackToolCall() ─── manual ──────┤
├── agent.trackSessionEnrichment() ─── manual ──────┤
└── OTEL exporter (AmplitudeGenAI...) ─── bridge ──────┤
│
AmplitudeAI client ◄──────┘
│
├── validate (if enabled)
├── apply middleware chain
├── batch events
│
▼
Amplitude HTTP API
│
┌─────────────┴──────────────┐
│ │
Amplitude Charts LLM Enrichment
(immediate querying) Pipeline (async)
│
▼
[Agent] Session Evaluation
[Agent] Score events
(topic, rubric, outcome)
Key points:
AmplitudeAI client, which batches and sends events.contentMode: 'full'). It produces server-side events like [Agent] Session Evaluation and [Agent] Score.contentMode: 'customer_enriched', the enrichment pipeline is skipped — you provide your own enrichments via trackSessionEnrichment().Start with full instrumentation. Use agents + sessions + provider wrappers. This is the recommended approach for both coding agent and manual workflows — it gives you every event type, per-user analytics, and server-side enrichment.
| Approach | When to use | What you get |
|---|---|---|
| Full control (recommended) | Any project, new or existing | BoundAgent + session.run() + provider wrappers — all event types, per-user funnels, cohorts, retention, quality scoring, enrichments |
| Express/Fastify middleware | Web app, auto-session per request | Same as full control with automatic session lifecycle via createAmplitudeAIMiddleware |
| Swap import | Existing codebase, incremental adoption | new OpenAI({ amplitude: ai }) — auto-tracking per call, add sessions when ready |
| Wrap | You've already created a client | wrap(client, ai) — instruments an existing client instance |
Zero-code / patch() | Verification or legacy codebases only | patch({ amplitudeAI: ai }) — [Agent] AI Response only, no user identity, no funnels |
| OTEL Bridge | Third-party framework exports OTEL spans | Add exporter to existing OTEL pipeline — limited to OTEL attributes |
The first four approaches all support the full event model. Choose based on how you want to integrate — the analytics capabilities are the same.
patch()is the exception: it only captures aggregate[Agent] AI Responseevents without user identity, useful only for verifying the SDK works or for codebases where you can't modify call sites.
These rules match the Python amplitude-ai agent guide and affect how Agent Analytics labels sessions and computes costs:
trackUserMessage(content, opts?) — The content string becomes $llm_message.text. Use a short, human-readable line for the real user intent (or a headless summary). Put large JSON, RAG packs, or pipeline state in opts.context or opts.eventProperties, not as the only content, or session titles and segmentation will show raw JSON.[Agent] User Message and [Agent] AI Response (with session + turn ids) drive turn counts and conversation views. observe() / trackSpan() add trace detail but do not replace those turn events; keep a user + AI pair for each user-visible cycle unless you intentionally document otherwise.baseURL — If you use stock openai (or another client) against a proxy, the SDK may not auto-wrap that path. Call trackAiMessage with usage token fields from the response (or stream end), pass the actual routed model id as the model argument, and set totalCostUsd if genai-prices cannot resolve the model string. Install @pydantic/genai-prices for automatic USD estimates when model + tokens are known.For serverless functions or API endpoints that handle one request at a time. The key requirement is flushing events before the handler returns:
import { AmplitudeAI } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
app.post('/chat', async (req, res) => {
const agent = ai.agent('api-handler', { userId: req.userId });
const session = agent.session({ sessionId: req.sessionId });
const result = await session.run(async (s) => {
s.trackUserMessage(req.body.message);
const start = performance.now();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: req.body.messages,
});
const latencyMs = performance.now() - start;
s.trackAiMessage(
response.choices[0].message.content ?? '',
'gpt-4o',
'openai',
latencyMs,
{
inputTokens: response.usage?.prompt_tokens,
outputTokens: response.usage?.completion_tokens,
},
);
return response.choices[0].message.content;
});
await ai.flush();
res.json({ response: result });
});
For multi-turn conversations where the session spans many request/response cycles. Create the session once and reuse it across turns:
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('chatbot', { userId: 'user-123', env: 'production' });
// Session persists across multiple turns
const session = agent.session({ sessionId: conversationId });
await session.run(async (s) => {
// Turn 1
s.trackUserMessage('What is Amplitude?');
const resp1 = await llm.chat('What is Amplitude?');
s.trackAiMessage(resp1.content, 'gpt-4o', 'openai', resp1.latencyMs, {
inputTokens: resp1.usage.input,
outputTokens: resp1.usage.output,
});
// Turn 2
s.trackUserMessage('How does it track events?');
const resp2 = await llm.chat('How does it track events?');
s.trackAiMessage(resp2.content, 'gpt-4o', 'openai', resp2.latencyMs, {
inputTokens: resp2.usage.input,
outputTokens: resp2.usage.output,
});
// Score the conversation
s.score('helpfulness', 0.9, session.sessionId, {
targetType: 'session',
source: 'ai',
});
});
// Session auto-ends here with all enrichments
For architectures where a parent agent delegates to specialized child agents. Use session.runAs() to automatically propagate the child agent's identity to both manual tracking calls and provider wrappers:
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY!, amplitude: ai });
const orchestrator = ai.agent('orchestrator', {
userId: 'user-123',
env: 'production',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer');
const session = orchestrator.session({ userId: 'user-123' });
await session.run(async (s) => {
s.trackUserMessage('Write a blog post about TypeScript generics');
// Research phase — provider calls automatically tagged with agentId='researcher'
const researchResult = await s.runAs(researcher, async (rs) => {
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Research TypeScript generics' }],
});
return completion.choices[0].message.content;
});
// Writing phase — provider calls automatically tagged with agentId='writer'
const draft = await s.runAs(writer, async (ws) => {
const completion = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: `Write a post using: ${researchResult}` }],
});
return completion.choices[0].message.content;
});
s.trackAiMessage(draft ?? '', 'gpt-4o', 'openai', totalLatencyMs, {
inputTokens: totalInput,
outputTokens: totalOutput,
});
});
// Events emitted:
// [Agent] User Message → agentId='orchestrator'
// [Agent] AI Response → agentId='researcher', parentAgentId='orchestrator'
// [Agent] AI Response → agentId='writer', parentAgentId='orchestrator'
// [Agent] AI Response → agentId='orchestrator'
// [Agent] Session End → agentId='orchestrator' (one session end, not per-child)
How runAs works:
sessionId, traceId, and turn counteragentId and parentAgentId in AsyncLocalStorage for the callback's durationamplitudeOverrides needed[Agent] Session End (the child operates within the parent session)s.runAs(child, (cs) => cs.runAs(grandchild, ...))The SDK auto-detects serverless environments (Vercel, AWS Lambda, Netlify, Google Cloud Functions, Azure Functions, Cloudflare Pages). When detected, session.run() automatically flushes all pending events before the promise resolves — no explicit ai.flush() needed. You can also control this explicitly via the autoFlush option on session():
// Auto-detected: flushes automatically in serverless, skips in long-running servers
agent.session({ userId, sessionId });
// Explicit control:
agent.session({ userId, sessionId, autoFlush: true }); // always flush
agent.session({ userId, sessionId, autoFlush: false }); // never flush
If you track events outside of session.run(), you still need await ai.flush() before your handler returns:
export async function handler(event: APIGatewayEvent) {
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('api-handler', {
userId: event.requestContext.authorizer?.userId,
});
const session = agent.session();
const result = await session.run(async (s) => {
s.trackUserMessage(JSON.parse(event.body ?? '{}').message ?? '');
const start = performance.now();
const response = await callLLM(JSON.parse(event.body ?? '{}').message);
const latencyMs = performance.now() - start;
s.trackAiMessage(response.content, response.model, 'openai', latencyMs, {
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
});
return response.content;
});
await ai.flush(); // Without this, events may be lost
return { statusCode: 200, body: JSON.stringify({ response: result }) };
}
ai.flush() — sends all buffered events and returns a promise. Use in serverless handlers and API endpoints where you need to ensure delivery before responding.ai.shutdown() — flushes and closes the underlying Amplitude client. Only needed if you created the client via apiKey (not when passing your own instance). Call on process exit (e.g., SIGTERM handler).process.on('SIGTERM', () => {
ai.shutdown();
process.exit(0);
});
track* methods catch and log errors internally. Your application code is never interrupted by tracking failures.@amplitude/analytics-node SDK.validate: true in AIConfig to get early validation errors for missing required fields (userId, sessionId, etc.). Validation errors throw ValidationError so you can catch them during development.import { AIConfig, AmplitudeAI, ValidationError } from '@amplitude/ai';
const ai = new AmplitudeAI({
apiKey: 'xxx',
config: new AIConfig({ validate: true }),
});
try {
ai.trackUserMessage({ userId: '', content: 'Hello', sessionId: 'sess-1' });
} catch (e) {
if (e instanceof ValidationError) {
console.error('Invalid tracking call:', e.message);
// "userId must be a non-empty string, got "
}
}
Use MockAmplitudeAI for unit tests:
import { MockAmplitudeAI } from '@amplitude/ai';
const mock = new MockAmplitudeAI();
const agent = mock.agent('test-agent', { userId: 'user-1' });
const session = agent.session({ sessionId: 'sess-1', userId: 'user-1' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
s.trackAiMessage('Hi!', 'gpt-4', 'openai', 100);
});
mock.assertEventTracked('[Agent] User Message', { userId: 'user-1' });
mock.assertEventTracked('[Agent] AI Response', { userId: 'user-1' });
mock.assertSessionClosed('sess-1');
mock.reset();
| Symptom | Cause | Fix |
|---|---|---|
| No events in Amplitude | API key not set or incorrect | Run amplitude-ai doctor — it checks AMPLITUDE_AI_API_KEY and reports a fix command |
Events tracked but [Agent] Cost USD is $0 | Model not in the pricing database, or total_cost_usd not passed | Pass totalCostUsd explicitly, or check that @pydantic/genai-prices / genai-prices is installed |
patch() doesn't instrument calls | patch() called after the provider client was created | Call patch() before importing or instantiating provider clients |
| Session context missing on events | LLM calls made outside session.run() | Wrap your LLM calls inside session.run(async () => { ... }) |
flush() hangs or times out in serverless | Process exits before flush completes | Use await ai.flush() before returning from your Lambda/Cloud Function handler |
wrap() TypeScript type errors | Passing a non-supported client type | wrap() only supports OpenAI, AzureOpenAI, and Anthropic clients; use provider classes for others |
MockAmplitudeAI events are empty | Tracking calls not inside a session context | Use mock.agent(...).session(...).run(...) to wrap tracked calls |
Cannot find module 'openai' in Turbopack/Webpack | Bundler rewrites import.meta.url, breaking dynamic require() | Pass the provider module directly: new OpenAI({ amplitude: ai, apiKey, openaiModule: OpenAISDK }). Same pattern for Anthropic, Gemini, etc. See each provider's <name>Module option. |
Run amplitude-ai doctor for automated environment diagnostics with fix suggestions.
For distributed tracing, inject context into outgoing request headers and extract on the receiving side:
import { randomUUID } from 'node:crypto';
import {
extractContext,
injectContext,
runWithContextAsync,
SessionContext,
} from '@amplitude/ai';
// Outgoing request
const headers = injectContext();
fetch(url, { headers });
// Receiving side
const extracted = extractContext(req.headers);
const ctx = new SessionContext({
sessionId: extracted.sessionId ?? randomUUID(),
traceId: extracted.traceId ?? null,
userId: extracted.userId ?? null,
});
await runWithContextAsync(ctx, async () => {
// Context available via getActiveContext()
});
Express-compatible middleware for automatic session tracking:
import { randomUUID } from 'node:crypto';
import { AmplitudeAI, createAmplitudeAIMiddleware } from '@amplitude/ai';
import express from 'express';
const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const app = express();
app.use(
createAmplitudeAIMiddleware({
amplitudeAI: ai,
userIdResolver: (req) =>
(req as { headers: { 'x-user-id'?: string } }).headers['x-user-id'] ??
null,
sessionIdResolver: (req) =>
(req as { headers: { 'x-session-id'?: string } }).headers[
'x-session-id'
] ?? randomUUID(),
agentId: 'api-server',
env: process.env.NODE_ENV ?? 'development',
}),
);
app.post('/chat', async (req, res) => {
// Session context available; trackUserMessage/trackAiMessage inherit sessionId, traceId
});
Use trackConversation() to import an entire conversation history in one call. Each message in the array is tracked as either a [Agent] User Message or [Agent] AI Response event, with turn IDs auto-incremented:
import { trackConversation } from '@amplitude/ai';
import * as amplitude from '@amplitude/analytics-node';
trackConversation({
amplitude,
userId: 'user-123',
sessionId: 'sess-abc',
agentId: 'support-bot',
messages: [
{ role: 'user', content: 'How do I reset my password?' },
{
role: 'assistant',
content: 'Go to Settings > Security > Reset Password.',
model: 'gpt-4o',
provider: 'openai',
latency_ms: 1200,
input_tokens: 15,
output_tokens: 42,
total_cost_usd: 0.002,
},
{ role: 'user', content: 'Thanks, that worked!' },
{
role: 'assistant',
content: 'Glad I could help!',
model: 'gpt-4o',
provider: 'openai',
latency_ms: 800,
input_tokens: 10,
output_tokens: 8,
},
],
});
This is useful for backfilling historical conversations or importing data from external systems. The function accepts all the same context fields (agentId, env, customerOrgId, etc.) as the individual tracking methods.
| Event Type | Source | Description |
|---|---|---|
[Agent] User Message | SDK | User sent a message |
[Agent] AI Response | SDK | AI model returned a response |
[Agent] Tool Call | SDK | Tool/function was invoked |
[Agent] Embedding | SDK | Embedding was generated |
[Agent] Span | SDK | Span (e.g. RAG step, transform) |
[Agent] Session End | SDK | Session ended |
[Agent] Session Enrichment | SDK | Session-level enrichment data |
[Agent] Score | Both | Evaluation score (quality, sentiment, etc.) |
[Agent] Session Evaluation | Server | Session-level summary: outcome, turn count, flags, cost. Emitted automatically. |
[Agent] Topic Classification | Server | One event per topic model per session. Emitted automatically. |
All event properties are prefixed with [Agent] (except [Amplitude] Session Replay ID). This reference is auto-generated and matches what gets registered in Amplitude's data catalog via the amplitude-ai-register-catalog CLI.
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID | string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Trace ID | string | No | Identifies one user-message-to-AI-response cycle within a session. |
[Agent] Turn ID | number | No | Monotonically increasing counter for event ordering within a session. |
[Agent] Agent ID | string | No | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Parent Agent ID | string | No | For multi-agent orchestration: the agent that delegated to this agent. |
[Agent] Customer Org ID | string | No | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Agent Version | string | No | Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison. |
[Agent] Agent Description | string | No | Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams. |
[Agent] Context | string | No | Serialized JSON dict of arbitrary segmentation dimensions (experiment_variant, surface, feature_flag, prompt_revision, etc.). |
[Agent] Env | string | No | Deployment environment: 'production', 'staging', or 'dev'. |
[Agent] SDK Version | string | Yes | Version of the amplitude-ai SDK that produced this event. |
[Agent] Runtime | string | Yes | SDK runtime: 'python' or 'node'. |
Event-specific properties for [Agent] User Message (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Message ID | string | Yes | Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages. |
[Agent] Component Type | string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Locale | string | No | User locale (e.g., 'en-US'). |
[Amplitude] Session Replay ID | string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] Is Regeneration | boolean | No | Whether the user requested the AI regenerate a previous response. |
[Agent] Is Edit | boolean | No | Whether the user edited a previous message and resubmitted. |
[Agent] Edited Message ID | string | No | The message_id of the original message that was edited (links the edit to the original). |
[Agent] Has Attachments | boolean | No | Whether this message includes file attachments (uploads, images, etc.). |
[Agent] Attachment Types | string[] | No | Distinct attachment types (e.g., 'pdf', 'image', 'csv'). Serialized JSON array. |
[Agent] Attachment Count | number | No | Number of file attachments included with this message. |
[Agent] Total Attachment Size Bytes | number | No | Total size of all attachments in bytes. |
[Agent] Attachments | string | No | Serialized JSON array of attachment metadata (type, name, size_bytes, mime_type). Only metadata, never file content. |
[Agent] Message Labels | string | No | Serialized JSON array of MessageLabel objects (key-value pairs with optional confidence). Used for routing tags, classifier output, business context. |
[Agent] Message Source | string | No | Origin of the user message: 'user' for real end-user input, 'agent' for inter-agent delegation (parent agent sending instructions to a child agent). Automatically set by provider wrappers based on parent_agent_id context. |
Event-specific properties for [Agent] AI Response (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Message ID | string | Yes | Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages. |
[Agent] Component Type | string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Model Name | string | Yes | LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514'). |
[Agent] Provider | string | Yes | LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock'). |
[Agent] Latency Ms | number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error | boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message | string | No | Error message text when Is Error is true. |
[Agent] Locale | string | No | User locale (e.g., 'en-US'). |
[Agent] Span Kind | string | No | Classification of the span type for OTEL bridge compatibility. |
[Amplitude] Session Replay ID | string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] TTFB Ms | number | No | Time to first byte/token in milliseconds. Measures perceived responsiveness for streaming. |
[Agent] Input Tokens | number | No | Number of input/prompt tokens consumed by this LLM call. |
[Agent] Output Tokens | number | No | Number of output/completion tokens generated by this LLM call. |
[Agent] Total Tokens | number | No | Total tokens consumed (input + output). |
[Agent] Reasoning Tokens | number | No | Tokens consumed by reasoning/thinking (o1, o3, extended thinking models). |
[Agent] Cache Read Tokens | number | No | Input tokens served from the provider's prompt cache (cheaper rate). Used for cache-aware cost calculation. |
[Agent] Cache Creation Tokens | number | No | Input tokens that created new prompt cache entries. |
[Agent] Cost USD | number | No | Estimated cost in USD for this LLM call. Cache-aware when cache token counts are provided. |
[Agent] Finish Reason | string | No | Why the model stopped generating: 'stop', 'end_turn', 'tool_use', 'length', 'content_filter', etc. |
[Agent] Tool Calls | string | No | Serialized JSON array of tool call requests made by the AI in this response. |
[Agent] Has Reasoning | boolean | No | Whether the AI response included reasoning/thinking content. |
[Agent] Reasoning Content | string | No | The AI's reasoning/thinking content (when available and content_mode permits). |
[Agent] System Prompt | string | No | The system prompt used for this LLM call (when content_mode permits). Chunked for long prompts. |
[Agent] System Prompt Length | number | No | Character length of the system prompt. |
[Agent] Tool Definitions | string | No | Normalized JSON array of tool definitions sent to the LLM (when content_mode permits). Each entry contains name, description, and parameters schema. |
[Agent] Tool Definitions Count | number | No | Number of tool definitions in the LLM request. |
[Agent] Tool Definitions Hash | string | No | Stable SHA-256 hash of the normalized tool definitions. Always present regardless of content_mode; enables toolset change detection without exposing schemas. |
[Agent] Temperature | number | No | Temperature parameter used for this LLM call. |
[Agent] Max Output Tokens | number | No | Maximum output tokens configured for this LLM call. |
[Agent] Top P | number | No | Top-p (nucleus sampling) parameter used for this LLM call. |
[Agent] Is Streaming | boolean | No | Whether this response was generated via streaming. |
[Agent] Prompt ID | string | No | Identifier for the prompt template or version used. |
[Agent] Was Copied | boolean | No | Whether the user copied this AI response content. An implicit positive quality signal. |
[Agent] Was Cached | boolean | No | Whether this response was served from a semantic/full-response cache (distinct from token-level prompt caching). |
[Agent] Model Tier | string | No | Model tier classification: 'fast' (GPT-4o-mini, Haiku, Flash), 'standard' (GPT-4o, Sonnet, Pro), or 'reasoning' (o1, o3, DeepSeek-R1). Auto-inferred from model name. |
[Agent] Has Attachments | boolean | No | Whether this AI response includes generated attachments (images, charts, files). |
[Agent] Attachment Types | string[] | No | Distinct attachment types in this AI response. Serialized JSON array. |
[Agent] Attachment Count | number | No | Number of attachments generated by the AI in this response. |
[Agent] Total Attachment Size Bytes | number | No | Total size of all AI-generated attachments in bytes. |
[Agent] Attachments | string | No | Serialized JSON array of AI-generated attachment metadata. |
[Agent] Message Labels | string | No | Serialized JSON array of MessageLabel objects attached to this AI response. |
[Agent] Message Label Map | string | No | Serialized JSON map of label key to value for quick lookup. |
Event-specific properties for [Agent] Tool Call (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Component Type | string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Latency Ms | number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error | boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message | string | No | Error message text when Is Error is true. |
[Agent] Locale | string | No | User locale (e.g., 'en-US'). |
[Agent] Span Kind | string | No | Classification of the span type for OTEL bridge compatibility. |
[Amplitude] Session Replay ID | string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] Invocation ID | string | Yes | Unique identifier for this tool invocation (UUID). Used to link tool calls to parent messages. |
[Agent] Tool Name | string | Yes | Name of the tool/function that was invoked (e.g., 'search_docs', 'web_search'). |
[Agent] Tool Success | boolean | Yes | Whether the tool call completed successfully. |
[Agent] Tool Input | string | No | Serialized JSON of the tool's input arguments. Only sent when content_mode='full'. |
[Agent] Tool Output | string | No | Serialized JSON of the tool's output/return value. Only sent when content_mode='full'. |
[Agent] Parent Message ID | string | No | The message_id of the user message that triggered this tool call. Links the tool call into the event graph. |
Event-specific properties for [Agent] Embedding (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Component Type | string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Model Name | string | Yes | LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514'). |
[Agent] Provider | string | Yes | LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock'). |
[Agent] Latency Ms | number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Span ID | string | Yes | Unique identifier for this embedding operation (UUID). |
[Agent] Input Tokens | number | No | Number of input tokens processed by the embedding model. |
[Agent] Embedding Dimensions | number | No | Dimensionality of the output embedding vector. |
[Agent] Cost USD | number | No | Estimated cost in USD for this embedding operation. |
Event-specific properties for [Agent] Span (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Latency Ms | number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error | boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message | string | No | Error message text when Is Error is true. |
[Agent] Span ID | string | Yes | Unique identifier for this span (UUID). |
[Agent] Span Name | string | Yes | Name of the operation (e.g., 'rag_pipeline', 'vector_search', 'rerank'). |
[Agent] Parent Span ID | string | No | Span ID of the parent span for nested pipeline steps. |
[Agent] Input State | string | No | Serialized JSON of the span's input state. Only sent when content_mode='full'. |
[Agent] Output State | string | No | Serialized JSON of the span's output state. Only sent when content_mode='full'. |
Event-specific properties for [Agent] Session End (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Enrichments | string | No | Serialized JSON of SessionEnrichments (topic classifications, rubric scores, outcome, flags). Attached when enrichments are provided at session close. |
[Agent] Abandonment Turn | number | No | Turn ID of the last user message that received an AI response before the user left. Low values (e.g., 1) strongly signal first-response dissatisfaction. |
[Agent] Session Idle Timeout Minutes | number | No | Custom idle timeout for this session (default 30 min). Tells the server how long to wait before auto-closing. |
Event-specific properties for [Agent] Session Enrichment (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Enrichments | string | Yes | Serialized JSON of SessionEnrichments: topic_classifications, rubrics, overall_outcome, quality_score, sentiment_score, boolean flags, agent chain metadata, and message labels. |
Event-specific properties for [Agent] Score (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Score Name | string | Yes | Name of the score (e.g., 'user-feedback', 'task_completion', 'accuracy', 'groundedness'). |
[Agent] Score Value | number | Yes | Numeric score value. Binary (0/1), continuous (0.0-1.0), or rating scale (1-5). |
[Agent] Target ID | string | Yes | The message_id or session_id being scored. |
[Agent] Target Type | string | Yes | What is being scored: 'message' or 'session'. |
[Agent] Evaluation Source | string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Comment | string | No | Optional text explanation for the score (respects content_mode). |
[Agent] Taxonomy Version | string | No | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At | number | No | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Score Label | string | No | Direction-neutral magnitude label derived from score value. Default 5-tier: very_high (>=0.8), high (>=0.6), moderate (>=0.4), low (>=0.2), very_low (>=0.0). Server-side only. |
[Agent] Session Evaluation is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID | string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Agent ID | string | Yes | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Customer Org ID | string | Yes | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Evaluation Source | string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Taxonomy Version | string | Yes | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At | number | Yes | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Overall Outcome | string | Yes | Session outcome classification: 'success', 'partial_success', 'failure', 'abandoned', 'response_provided', etc. |
[Agent] Turn Count | number | Yes | Number of conversation turns in this session. |
[Agent] Session Total Tokens | number | No | Total LLM tokens consumed across all turns in this session. |
[Agent] Session Avg Latency Ms | number | No | Average AI response latency in milliseconds across the session. |
[Agent] Request Complexity | string | No | Complexity classification of the user's request: 'simple', 'moderate', 'complex', or 'ambiguous'. |
[Agent] Has Task Failure | boolean | Yes | Whether the agent failed to complete the user's request. |
[Agent] Has Negative Feedback | boolean | Yes | Whether the user expressed dissatisfaction during the session. |
[Agent] Has Technical Failure | boolean | Yes | Whether technical errors occurred (tool timeouts, API failures, etc.). |
[Agent] Has Data Quality Issues | boolean | Yes | Whether the AI output had data quality problems (wrong data, hallucinations, etc.). |
[Agent] Models Used | string[] | No | LLM models used in this session. JSON array of strings. |
[Agent] Root Agent Name | string | No | Entry-point agent in multi-agent flows. |
[Agent] Agent Chain Depth | number | No | Number of agents in the delegation chain. |
[Agent] Task Failure Type | string | No | Specific failure type when has_task_failure is true (e.g., 'wrong_answer', 'unable_to_complete'). |
[Agent] Technical Error Count | number | No | Count of technical errors that occurred during the session. |
[Agent] Error Categories | string[] | No | Categorized error types (e.g., 'chart_not_found', 'timeout'). JSON array of strings. |
[Agent] Behavioral Patterns | string[] | No | Detected behavioral anti-patterns (e.g., 'retry_storm', 'clarification_loop', 'early_abandonment'). JSON array of strings. |
[Agent] Session Cost USD | number | No | Total LLM cost in USD for this AI session (aggregated from per-message costs). |
[Agent] Enrichment Cost USD | number | No | Cost in USD of running the enrichment pipeline's LLM inference for this session. Distinct from the session's own LLM cost. |
[Agent] Quality Score | number | No | Overall quality score (0.0-1.0) computed by the enrichment pipeline for this session. |
[Agent] Sentiment Score | number | No | User sentiment score (0.0-1.0) inferred from the conversation by the enrichment pipeline. |
[Agent] Task Failure Reason | string | No | Explanation of why the task failed when has_task_failure is true (e.g., 'chart data source unavailable'). |
[Agent] Agent Chain | string[] | No | Serialized JSON array of agent IDs representing the delegation chain in multi-agent flows. |
[Agent] Project ID | string | No | Amplitude project ID that owns the AI session being evaluated. |
[Agent] Has User Feedback | boolean | Yes | Whether the session received explicit user feedback (thumbs up/down, rating). |
[Agent] User Score | number | No | Aggregate user feedback score for the session (0.0-1.0). Present only when has_user_feedback is true. |
[Agent] Agent Version | string | No | Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison. |
[Agent] Agent Description | string | No | Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams. |
[Agent] Topic Classification is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID | string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Agent ID | string | Yes | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Customer Org ID | string | Yes | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Evaluation Source | string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Taxonomy Version | string | Yes | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At | number | Yes | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Topic | string | Yes | Which topic model this classification is for (e.g., 'product_area', 'query_intent', 'error_domain'). |
[Agent] Selection Mode | string | Yes | Whether this topic model uses 'single' (MECE) or 'multiple' (multi-label) selection. |
[Agent] Primary | string | No | Primary classification value (e.g., 'charts', 'billing_issues'). |
[Agent] Secondary | string[] | No | Secondary classifications for multi-label topics. JSON array of strings. |
[Agent] Subcategories | string[] | No | Subcategories for finer classification within the primary topic (e.g., 'TREND_ANALYSIS', 'WRONG_EVENT'). JSON array of strings. |
A realistic example of what gets sent to Amplitude for an AI response:
{
"event_type": "[Agent] AI Response",
"user_id": "user-42",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Trace ID": "trace-def456",
"[Agent] Turn ID": 2,
"[Agent] Message ID": "msg-789xyz",
"[Agent] Model Name": "gpt-4o",
"[Agent] Provider": "openai",
"[Agent] Model Tier": "standard",
"[Agent] Latency Ms": 1203,
"[Agent] Input Tokens": 150,
"[Agent] Output Tokens": 847,
"[Agent] Total Tokens": 997,
"[Agent] Cost USD": 0.0042,
"[Agent] Is Error": false,
"[Agent] Finish Reason": "stop",
"[Agent] Is Streaming": false,
"[Agent] Component Type": "llm",
"[Agent] Agent ID": "support-bot",
"[Agent] Env": "production",
"[Agent] SDK Version": "0.1.0",
"[Agent] Runtime": "node"
}
}
{
"event_type": "[Agent] User Message",
"user_id": "user-42",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Turn ID": 1,
"[Agent] Message ID": "msg-123abc",
"[Agent] Component Type": "user_input",
"[Agent] Agent ID": "support-bot",
"[Agent] Env": "production",
"[Agent] SDK Version": "0.1.0",
"[Agent] Runtime": "node",
"$llm_message": {
"text": "How do I reset my password?"
}
}
}
{
"event_type": "[Agent] Tool Call",
"user_id": "user-42",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Turn ID": 3,
"[Agent] Invocation ID": "inv-456def",
"[Agent] Tool Name": "search_knowledge_base",
"[Agent] Tool Success": true,
"[Agent] Is Error": false,
"[Agent] Latency Ms": 340,
"[Agent] Component Type": "tool",
"[Agent] Agent ID": "support-bot",
"[Agent] Tool Input": "{\"query\":\"password reset instructions\"}",
"[Agent] Tool Output": "{\"results\":[{\"title\":\"Password Reset Guide\"}]}",
"[Agent] SDK Version": "0.1.0",
"[Agent] Runtime": "node"
}
}
{
"event_type": "[Agent] Score",
"user_id": "user-42",
"event_properties": {
"[Agent] Score Name": "thumbs-up",
"[Agent] Score Value": 1,
"[Agent] Target ID": "msg-789xyz",
"[Agent] Target Type": "message",
"[Agent] Evaluation Source": "user",
"[Agent] Session ID": "sess-abc123",
"[Agent] Agent ID": "support-bot",
"[Agent] SDK Version": "0.1.0",
"[Agent] Runtime": "node"
}
}
The [Agent] event schema is not tied to this SDK. If your stack doesn't have an Amplitude AI SDK, you can send the same events directly via Amplitude's ingestion APIs.
When you use this SDK, the following are managed automatically. If you send events directly, you are responsible for these:
| Concern | SDK behavior | DIY equivalent |
|---|---|---|
| Session ID | Generated once per session() and propagated to every event | Generate a UUID per conversation and include it as [Agent] Session ID on every event |
| Deduplication | Automatic insert_id on each event | Set a unique insert_id per event to prevent duplicates on retry |
| Property prefixing | All properties are prefixed with [Agent] | You must include the [Agent] prefix in every property name |
| Cost / token calculation | Auto-computed from model and token counts | Compute and send [Agent] Cost USD, [Agent] Input Tokens, etc. yourself |
| Server-side enrichment | [Agent] Session Evaluation, [Agent] Topic Classification, and [Agent] Score events are emitted automatically by the enrichment pipeline after [Agent] Session End | These fire automatically — you do not need to send them. Just send the SDK-level events and close the session with [Agent] Session End. |
| Method | Best for | Docs |
|---|---|---|
| HTTP V2 API | Real-time, low-to-medium volume | HTTP V2 API docs |
| Batch Event Upload API | High volume, backfills | Batch API docs |
| Amazon S3 Import | Bulk historical import, warehouse-first workflows | S3 Import docs |
curl -X POST https://api2.amplitude.com/2/httpapi \
-H 'Content-Type: application/json' \
-d '{
"api_key": "YOUR_API_KEY",
"events": [
{
"event_type": "[Agent] User Message",
"user_id": "user-42",
"insert_id": "evt-unique-id-1",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Trace ID": "trace-def456",
"[Agent] Turn ID": 1,
"[Agent] Agent ID": "support-bot",
"[Agent] Message ID": "msg-001"
}
},
{
"event_type": "[Agent] AI Response",
"user_id": "user-42",
"insert_id": "evt-unique-id-2",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Trace ID": "trace-def456",
"[Agent] Turn ID": 1,
"[Agent] Message ID": "msg-002",
"[Agent] Agent ID": "support-bot",
"[Agent] Model Name": "gpt-4o",
"[Agent] Provider": "openai",
"[Agent] Latency Ms": 1203,
"[Agent] Input Tokens": 150,
"[Agent] Output Tokens": 420,
"[Agent] Cost USD": 0.0042
}
}
]
}'
Refer to the Event Schema and Event Property Reference tables above for required and optional properties per event type.
Amplitude's Data Catalog documents events and properties with descriptions, types, and required flags. The @amplitude/ai package includes a tool to generate all the Taxonomy API calls for you.
The bundled CLI reads data/agent_event_catalog.json and prints executable curl commands — it makes no network requests itself.
# Preview the curl commands (uses placeholder keys)
npx amplitude-ai-register-catalog
# Generate with your real keys
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET
# Pipe to bash to execute immediately
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET | bash
# EU data residency
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET --eu | bash
If you have Python available, the amplitude-ai package provides a CLI that calls the Taxonomy API directly with retry logic and a progress summary:
pip install amplitude-ai
amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET
All 10 [Agent] event types and their properties (see Event Property Reference above), organized under the "Agent Analytics" category. The commands are idempotent — safe to re-run. They create missing events/properties and update existing ones.
calculateCost() — Returns cost in USD when @pydantic/genai-prices is installed; otherwise returns 0 (never null).countTokens(text, model?) — Uses tiktoken when available. For unknown models, tries o200k_base encoding before falling back to cl100k_base (matching the Python SDK).estimateTokens(text) — Heuristic fallback: ceil(chars/3.5 + words*0.1) (matching the Python SDK).stripProviderPrefix(modelName) — Splits on : (e.g., openai:gpt-4o → gpt-4o). Use for normalizing model IDs before cost lookup. Import from @amplitude/ai/internals.The package exports structural interfaces for provider shapes from @amplitude/ai and @amplitude/ai/types: ChatCompletionParams, ChatCompletionResponse, AnthropicParams, AnthropicResponse, BedrockConverseParams, BedrockConverseResponse, MistralChatParams, MistralChatResponse, TrackFn, TrackCallOptions, and related types. Use these for typing provider integrations without depending on the underlying SDK types.
All PROP_* and EVENT_* constants are exported for advanced use:
import {
EVENT_AI_RESPONSE,
EVENT_EMBEDDING,
EVENT_SCORE,
EVENT_SESSION_END,
EVENT_SESSION_ENRICHMENT,
EVENT_SPAN,
EVENT_TOOL_CALL,
EVENT_USER_MESSAGE,
PROP_MODEL_NAME,
PROP_SESSION_ID,
PROP_TRACE_ID,
// ... etc
} from '@amplitude/ai';
See src/core/tracking.ts and src/core/constants.ts for the full list.
This SDK is designed to be discovered and used by any AI coding agent — Cursor, Claude Code, Windsurf, Copilot, Codex, Cline, or any agent that can read files.
The fastest path:
npm install @amplitude/ai
npx amplitude-ai
The CLI prints a prompt to paste into your agent:
Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md
The agent reads the guide, scans your project, and instruments everything in 4 phases: Detect, Discover, Instrument, Verify.
Files shipped with the package:
| File | Purpose |
|---|---|
amplitude-ai.md | Primary guide — self-contained 4-phase instrumentation workflow and full API reference |
AGENTS.md | Concise index with canonical patterns, MCP surface, gotchas, and CLI reference |
llms.txt | Compact discovery file listing tools, resources, and event names |
llms-full.txt | Extended reference with full API signatures, provider coverage matrix, and common error resolutions |
mcp.schema.json | Structured JSON describing the MCP server's tools, resources, and prompt |
Optional: MCP server for advanced tooling. Run amplitude-ai mcp to start the MCP server (standard stdio protocol). MCP-compatible agents can call tools like scan_project, instrument_file, validate_file, and generate_verify_test for deeper analysis. The MCP server is not required for the core instrumentation workflow — amplitude-ai.md is self-contained.
If you're moving from amplitude_ai (Python) to @amplitude/ai (TypeScript/Node), the core event model is the same, but ergonomics differ to match the runtime:
| Area | Python (amplitude_ai) | TypeScript (@amplitude/ai) |
|---|---|---|
| Session scope | with session as s: | await session.run(async (s) => { ... }) |
| Tool/observe wrappers | @tool, @observe decorators | tool(), observe() HOFs |
| Context propagation | contextvars | AsyncLocalStorage |
| Tool input schema | Optional auto-schema from Python type hints | Explicit inputSchema object (recommended: define with Zod, pass JSON Schema) |
| Sync behavior | Native sync + async wrappers | Wrappers return async (Promise<T>) |
| Middleware | Starlette/FastAPI middleware | Express-compatible middleware |
| Bootstrap/preload | sitecustomize.py + PYTHONPATH patterns | NODE_OPTIONS=--import preload patterns |
| Provider patching model | Python class replacement | Prototype patching + Proxy fallback for lazy getters |
Features that do not map 1:1 because of platform/runtime constraints:
amplitude_user_id=...)# Python
from amplitude_ai import AmplitudeAI, tool, observe
ai = AmplitudeAI(api_key="xxx")
agent = ai.agent("my-agent", user_id="u1")
with agent.session(user_id="u1") as s:
s.track_user_message("Hello")
s.track_ai_message("Hi!", model="gpt-4", provider="openai", latency_ms=100)
@tool(name="search")
def search(query: str) -> str:
return db.search(query)
// TypeScript
import { AmplitudeAI, tool } from '@amplitude/ai';
const ai = new AmplitudeAI({ apiKey: 'xxx' });
const agent = ai.agent('my-agent', { userId: 'u1' });
const session = agent.session({ userId: 'u1' });
await session.run(async (s) => {
s.trackUserMessage('Hello');
s.trackAiMessage('Hi!', 'gpt-4', 'openai', 100);
});
const search = tool(async (args: { query: string }) => db.search(args.query), {
name: 'search',
});
Contributions are welcome! Please open an issue first to discuss what you'd like to change, then submit a pull request.
git checkout -b my-feature)pnpm install)pnpm run test:coverage) and TypeScript compiles (pnpm run test:typescript)FAQs
Amplitude AI SDK - LLM usage tracking for Amplitude Analytics
We found that @amplitude/ai demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 10 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.