Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

clavue-agent-sdk

Package Overview
Dependencies
Maintainers
1
Versions
29
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

clavue-agent-sdk

In-process TypeScript agent runtime with multi-agent graph DSL, live tracing, 4-scope guardrails, capability-token sandbox, RAG, framework-agnostic Generative UI, voice adapters, worker_thread subagent isolation, Anthropic prompt caching, streaming, and s

latest
Source
npmnpm
Version
2.2.0
Version published
Maintainers
1
Created
Source

Clavue Agent SDK

npm version Node.js License: MIT Tests

Production-grade TypeScript agent runtime. Embed run(), query(), or createAgent() directly in your Node.js process — no daemon, no subprocess, no local CLI dependency. Multi-provider (Anthropic + OpenAI-compatible), streaming-first, prompt-cache-aware, with a v3 capability layer that ships multi-agent graph, live tracing, 4-scope guardrails, capability-token sandbox, RAG, framework-agnostic Generative UI, and provider-agnostic voice.

生产级 TypeScript agent runtime。在你的 Node.js 进程内直接运行 run() / query() / createAgent() —— 无守护进程、无子进程、无本地 CLI 依赖。 多 provider(Anthropic + OpenAI 兼容),流式优先,自动 prompt cache, v3 能力层提供:多 agent graph、实时 tracing、4 scope guardrails、 capability-token sandbox、RAG、框架无关的 Generative UI 与 provider 无关的 voice 接入。

npm install clavue-agent-sdk             # latest = 1.0.5

Also available in Go: clavue-agent-sdk-go

What's in 1.0.3 / 1.0.3 速览

The 1.0.x line closes the 17-item v2 audit, ships the v3 seven-axis capability layer (1.0.1), and adds the Tier A performance landings plus the Tier B list-cache family (1.0.3). Hard numbers (reproduce with npm run bench):

1.0.x 关闭了 v2 审计的 17 项问题,发布 v3 七轴能力层(1.0.1),并在 1.0.3 补齐 Tier A 性能项与 Tier B list-cache 家族。下面这些数字都可以用 npm run bench 复现:

Metric / 指标0.7.5 baseline1.0.3Δ
src/engine.ts lines15371041-32%
Engine helper modules015+15 files
Tests passing272720+165%
Token estimator content classes separated1 (constant 0.25)4 (0.25 / 0.33 / 0.39 / 0.63)
Worker_thread spawn p50n/a (stub)~60 msruntime real
Worker_thread preflight abort p95n/a<0.1 msshort-circuit
listMemories / listAgentJobs / listSessions / listIssueWorkflowRuns warm callsO(N) readdir + readFileO(0) (cached, invalidated on write)new in 1.0.3

See docs/v2_benchmark_report.md for the v2 methodology and raw output, and docs/tier-a-summary.md for the 1.0.3 Tier A / Tier B landings (per-item goal, files, public API, trace surface, tests).

Headline capabilities at 1.0.3

  • Streaming first. client.messages.stream(...) for Anthropic, SSE for OpenAI Chat. Set includePartialMessages: true and consume partial_message events for sub-second first-token latency in your UI.
  • Prompt caching on by default. Anthropic cache_control: ephemeral is applied to system and the trailing tool — multi-turn input cost drops 70-80% on cache hits, with no code change required from hosts.
  • Real subagent isolation. runtime: 'worker_thread' spawns a fresh V8 isolate per subagent. Pre-flight abort short-circuit, hard timeout, curated env forwarding, no shared registries.
  • Structured outputs that actually work. outputSchema synthesizes an _output tool with tool_choice (Anthropic) or response_format: { type: 'json_schema' } (OpenAI). The 0.7.x jsonSchema field was wired but never read; in 1.0.1 it's a deprecated alias that's actually plumbed through.
  • Real issue-workflow loop. runIssueWorkflowWithAgent(input) now drives a build → verify → review → fix loop with an Agent and a Verifier you supply (default CommandVerifier shells npm test / npm run lint). The legacy runIssueWorkflow is kept for back-compat but only records what the host evaluates.
  • Subpath imports. clavue-agent-sdk/core, /tools, /contracts, /workflow, /retro, /testing, plus the v3 axes (/graph, /guardrails, /tracing, /sandbox, /rag, /genui, /voice). Root barrel still re-exports everything for back-compat.
  • Honest deprecations. jsonSchema, maxThinkingTokens, runIssueWorkflow, and the cost result field have ≥90 days of overlap before removal in 1.1.0. See docs/v1_to_v2_migration.md.

v3 capability layer / v3 能力层

Every axis is exercised by tests in this repo. Line numbers below point at live source so you can verify, not just trust the README.

每个能力轴都有测试覆盖,下表行号指向源码本身。

Capability / 能力Where it lives / 实现位置TestsExample
Multi-agent graph DSL (6 node kinds: agent / verifier / router / parallel / human / retriever)src/graph/runtime.ts, src/graph/types.tstests/graph.test.ts (+ retriever)examples/19-graph-dsl.ts, 28-rag-graph.ts
4-scope guardrails (input / output / tool_input / tool_output) with 'abort' | 'skip' | 'continue' policy hooksrc/guardrails/runtime.ts, src/engine.tstests/guardrails.test.ts (×3)examples/20-guardrails.ts
Live tracing + replay + OTel SDK bridgesrc/tracing/runtime.ts, src/tracing/otel-shim.tstests/tracing*.test.ts (×3)examples/21-tracing-replay.ts, 27-trace-exporter.ts, 29-otel-shim.ts
Capability tokens + sandbox primitivessrc/sandbox/runtime.tstests/sandbox.test.tsexamples/22-capability-tokens.ts
RAG: RetrieverInterface + InMemoryRetriever + PgvectorRetriever + retriever graph nodesrc/rag/runtime.ts, src/rag/pgvector.tstests/rag*.test.ts (×3)examples/23-rag-retriever.ts, 28-rag-graph.ts
Framework-agnostic Generative UI stream (UiStreamSink / UiStreamSource)src/genui/index.tstests/genui.test.tsexamples/24-generative-ui.ts
Provider-agnostic voice (ASR + TTS) with Deepgram / Whisper-OpenAI / ElevenLabs stub adapterssrc/voice/runtime.ts, src/voice/adapters.tstests/voice*.test.ts (×2)examples/25-voice.ts, 30-voice-adapters.ts

Run any axis offline with no API key:

npx tsx examples/19-graph-dsl.ts          # 6-node-kind graph
npx tsx examples/20-guardrails.ts         # 4-scope guardrails
npx tsx examples/21-tracing-replay.ts     # TraceStore + replay
npx tsx examples/22-capability-tokens.ts  # sandbox capability tokens
npx tsx examples/23-rag-retriever.ts      # InMemoryRetriever
npx tsx examples/24-generative-ui.ts      # framework-agnostic GenUI stream
npx tsx examples/27-trace-exporter.ts     # console + JSONL exporters
npx tsx examples/28-rag-graph.ts          # retriever node kind
npx tsx examples/29-otel-shim.ts          # OtelTraceExporter bridge
npx tsx examples/30-voice-adapters.ts     # Deepgram / Whisper / ElevenLabs (stub fetch)
npx tsx examples/32-tier-a-performance.ts # tool cache + adaptive concurrency + fallback chain + memory consolidation

examples/31-worker-thread-subagent.ts needs a real API key (it spawns a Worker that constructs its own Agent from env credentials):

CLAVUE_AGENT_API_KEY=... npx tsx examples/31-worker-thread-subagent.ts

详细用法见 docs/USAGE.md(每个能力轴:何时用 / 如何用 / 示例 / 注意事项)。

Documentation map / 文档地图

Get started

  • docs/USAGE.md — feature-by-feature usage guide (when, how, gotchas) for every v3 capability + worker_thread + caching
    • structured outputs.
  • docs/programmatic-integration-guide.md — embedding patterns: services, CI, workers, internal platforms.
  • docs/desktop-im-archiver-integration.md — handbook for embedding the SDK into a desktop app that archives WeChat / Feishu / QQ / DingTalk messages via clipboard + screenshots (no client internals touched). Pairs with examples/33-im-archiver.ts.
  • docs/desktop-tools-integration.md — opt-in clavue-agent-sdk/desktop subpath: clipboard, screen capture, file search, open folders/apps/URLs, draft emails, system notifications. Per-app recipes for WeChat / Lark / Office / WPS / Gmail. Pairs with examples/34-desktop-tools.ts.

Upgrade & contracts

  • docs/v1_to_v2_migration.md — upgrade path from 0.7.x to 1.0.x with deprecation table.
  • docs/v2_audit_report.md — 17-item audit of 0.7.5 with reconciliation showing what's fixed in 1.0.1.
  • docs/v2_benchmark_report.md — every improvement paired with a reproducible measurement.
  • docs/tier-a-summary.md — Tier A performance landings (tool-result cache, adaptive concurrency, fallback chain, prompt_cache_key, memory list cache, memory consolidation) and the Tier B list-cache family (listMemories, listAgentJobs, listSessions, listIssueWorkflowRuns).

Architecture & roadmap

Why Clavue / 为什么选择 Clavue

  • Library-first agent runtime: embed run(), query(), or createAgent() directly in Node.js services, CI, workers, web backends, and internal platforms. No subprocess.

  • Production controls baked in: named toolsets, allow/deny filters, hooks, koa-style middleware, permission modes, workspace path guards, schema-versioned events, policy traces, quality gates, and budget controls.

  • Durable workflow contracts: background AgentJobs, real runIssueWorkflowWithAgent loop, WORKFLOW.md parsing, proof-of-work artifacts, orchestration policy helpers, runtime namespaces, session persistence, memory injection, self-improvement capture, retro/eval loops.

  • Provider portability: Anthropic Messages and OpenAI-compatible providers share the same tool, memory, event, and result contracts. Models from third-party gateways (OpenRouter, etc.) are first-class.

  • Honest measurements: npm run bench produces reproducible numbers. Every README claim points at code or a measurement, not marketing.

  • 库优先的 agent runtime: 直接在 Node.js 服务、CI、worker、Web 后端 和内部平台里嵌入 run() / query() / createAgent()。无子进程。

  • 生产级控制内建: 命名工具集、allow/deny、hooks、koa 风格 middleware、 权限模式、workspace 路径防护、schema-versioned 事件、policy trace、 quality gates 与预算控制。

  • 持久化工作流契约: background AgentJobs、真实的 runIssueWorkflowWithAgent 闭环、WORKFLOW.md 解析、proof-of-work artifact、orchestration policy helper、runtime namespace、session persistence、memory 注入、self-improvement、retro/eval。

  • 多 provider 可移植: Anthropic Messages 与 OpenAI 兼容 provider 共用同一套工具、记忆、事件与结果协议;第三方 gateway(OpenRouter 等) 是一等公民。fallbackModel 支持单字符串或有序数组,在主 provider 失败时按链路依次尝试(如 GPT → Claude → GLM)。

  • 诚实的可量化能力: npm run bench 输出可复现数字。README 里每条 断言都指向源码或测量结果,不是营销话术。

Quick start / 快速开始

Use directly with npx / 直接用 npx 运行

No local install is required for quick automation from a terminal or CI job.

终端或 CI 里可以直接用 npx 运行,不需要先安装到项目里。

export CLAVUE_AGENT_API_KEY=your-api-key
npx clavue-agent-sdk "Read package.json and summarize this project"

# Safer read-only review / 更安全的只读审查
npx clavue-agent-sdk "Review src for obvious bugs" --toolset repo-readonly

# Combine named toolsets / 组合命名工具集
npx clavue-agent-sdk "Research and review this repo" --toolset repo-readonly,research

# OpenAI-compatible model / OpenAI 兼容模型
npx clavue-agent-sdk \
  --api-type openai-completions \
  --model gpt-5.4 \
  --base-url https://api.openai.com/v1 \
  "Explain the repository structure"

# Opt-in run learning / 可选开启 run 自学习
npx clavue-agent-sdk \
  --self-improvement \
  --allow Read,Glob,Grep \
  "Review package.json for release readiness risks"

# Or enable it from CI/env / 也可以通过 CI/env 开启
CLAVUE_AGENT_SELF_IMPROVEMENT=true \
  npx clavue-agent-sdk --allow Read,Glob,Grep "Review package.json"

CLI options: --prompt, --model, --api-type, --api-key, --base-url, --cwd, --max-turns, --autonomy, --permission-mode, --allow, --toolset, --deny, --self-improvement, --json. Issue subcommand only: --max-iterations, --passing-score, --require-gate.

Environment variables: CLAVUE_AGENT_API_KEY, CLAVUE_AGENT_AUTH_TOKEN, CLAVUE_AGENT_API_TYPE, CLAVUE_AGENT_MODEL, CLAVUE_AGENT_BASE_URL, CLAVUE_AGENT_AUTONOMY, CLAVUE_AGENT_PERMISSION_MODE, CLAVUE_AGENT_SELF_IMPROVEMENT, AGENT_SDK_MAX_TOOL_CONCURRENCY.

命令行参数:--prompt--model--api-type--api-key--base-url--cwd--max-turns--autonomy--permission-mode--allow--toolset--deny--self-improvement--json。仅 issue 子命令:--max-iterations--passing-score--require-gate

环境变量:CLAVUE_AGENT_API_KEYCLAVUE_AGENT_AUTH_TOKENCLAVUE_AGENT_API_TYPECLAVUE_AGENT_MODELCLAVUE_AGENT_BASE_URLCLAVUE_AGENT_AUTONOMYCLAVUE_AGENT_PERMISSION_MODECLAVUE_AGENT_SELF_IMPROVEMENTAGENT_SDK_MAX_TOOL_CONCURRENCY

Best practices / 最佳使用实践

Pick the right integration mode / 选择合适的集成方式

  • Use npx clavue-agent-sdk ... for quick terminal automation, CI checks, and one-off repository analysis.

  • Use run() for backend jobs where you want one prompt in, one typed AgentRunResult out.

  • Use query() for streaming UIs, logs, dashboards, and integrations that need live assistant/tool events.

  • Use createAgent() for long-lived apps that need multi-turn state, sessions, hooks, MCP servers, custom subagents, or repeated prompts.

  • 快速终端自动化、CI 检查、一次性仓库分析:使用 npx clavue-agent-sdk ...

  • 后端任务只需要“一次输入、一次结构化结果”:使用 run()

  • 前端 UI、日志面板、实时事件流:使用 query()

  • 长生命周期应用、多轮会话、hooks、MCP、自定义 subagent 或重复调用:使用 createAgent()

Start narrow, then expand tools / 先收窄权限,再逐步扩展工具

Prefer the smallest tool surface that can complete the task. Start with read-only tools for review and analysis, then add write or shell tools only when the workflow needs them.

优先使用能完成任务的最小工具权限。审查和分析先从只读工具开始,只有在确实需要修改文件或执行命令时再增加写入或 shell 工具。

# Read-only repository review / 只读仓库审查
npx clavue-agent-sdk "Review this repo for release risks" \
  --toolset repo-readonly \
  --max-turns 6

# Focused code change with explicit tools / 明确授权工具的定向修改
npx clavue-agent-sdk "Fix the failing package payload test" \
  --allow Read,Glob,Grep,Edit,Bash \
  --permission-mode trustedAutomation \
  --autonomy autonomous \
  --max-turns 10

# Safer low-confirmation local edits / 更安全的低确认本地编辑
npx clavue-agent-sdk "Update usage docs and run verification" \
  --toolset repo-edit \
  --permission-mode acceptEdits \
  --autonomy autonomous \
  --max-turns 8

# CI-friendly JSON output / 适合 CI 的 JSON 输出
npx clavue-agent-sdk "Check whether package.json is release-ready" \
  --toolset repo-readonly \
  --json

Set cwd, model, and budgets explicitly / 显式设置 cwd、模型和预算

For automation, set cwd, model, maxTurns, and tool permissions explicitly so runs are reproducible and bounded.

自动化场景建议显式设置 cwdmodelmaxTurns 和工具权限,让运行结果更可复现、成本和轮次更可控。

import { run } from "clavue-agent-sdk";

const result = await run({
  prompt: "Review the package for publish-readiness and return concise findings.",
  options: {
    cwd: process.cwd(),
    model: "claude-sonnet-4-6",
    toolsets: ["repo-readonly"],
    maxTurns: 6,
  },
});

if (result.status !== "completed") {
  throw new Error(result.errors?.join("\n") || result.subtype);
}

console.log(result.text);

Use structured outputs in automation / 自动化中使用结构化结果

In CI or services, prefer run() or CLI --json instead of scraping assistant text from stdout. Check status, subtype, errors, usage, and total_cost_usd before deciding whether a job passed.

在 CI 或服务端集成里,优先使用 run() 或 CLI --json,不要依赖解析普通文本输出。根据 statussubtypeerrorsusagetotal_cost_usd 判断任务是否成功。

Enforce production controls / 启用生产控制能力

For production hosts, combine narrow toolsets, permissionMode, qualityGatePolicy, memory policy, doctor(), and runBenchmarks() instead of relying only on prompt instructions.

生产宿主应组合使用最小工具集、permissionModequalityGatePolicy、memory policy、doctor()runBenchmarks(),不要只依赖 prompt 约束。

import { doctor, run, runBenchmarks } from "clavue-agent-sdk";

const health = await doctor({
  toolsets: ["repo-readonly"],
  memory: { enabled: true },
});
if (health.status === "error") throw new Error("SDK runtime is not ready");

const result = await run({
  prompt: "Review the current package and report release blockers.",
  options: {
    toolsets: ["repo-readonly"],
    permissionMode: "default",
    memory: { enabled: true, policy: { mode: "brainFirst" } },
    quality_gates: [{ name: "release-review", status: "passed" }],
    qualityGatePolicy: { required: ["release-review"] },
    maxTurns: 6,
  },
});

if (result.subtype === "error_quality_gate_failed") {
  throw new Error(result.errors?.join("\n") || "Required quality gate failed");
}

const benchmarks = await runBenchmarks({ iterations: 3 });
console.log(benchmarks.metrics);

Current memory trace records policy, query, repo path, selected memory IDs, selected memory score/reason metadata, source/scope/confidence, validation state, retrieval steps, injected count, and whether retrieval happened before the first model call.

当前 memory trace 会记录 policy、query、repo path、selected memory IDs、被选记忆的分数和原因、source/scope/confidence、validation state、retrieval steps、injected count,以及是否在首次模型调用前完成检索。

The current capability upgrade program is tracked in docs/agent-sdk-capability-upgrade-program.md. It expands the SDK beyond coding automation into collection, organization, planning, problem solving, memory intelligence, skill creation, self-learning, reusable agents, and workflow templates.

当前能力升级计划见 docs/agent-sdk-capability-upgrade-program.md。它会把 SDK 从代码自动化扩展到资料收集、整理、规划、问题解决、记忆智能、技能创建、自学习、可复用 agent 和工作流模板。

Keep prompts operational / 让 Prompt 面向执行

Good prompts specify the goal, boundaries, expected output format, and verification command. Avoid broad prompts that mix unrelated work.

好的 prompt 应包含目标、边界、期望输出格式和验证命令。避免把多个无关任务混在一个过大的 prompt 里。

Good: Review src/providers/openai.ts for cancellation bugs. Do not edit files. Return findings with file:line references.
Good: Update README quick-start examples only. Run npm run build after editing.
Avoid: Make the project better.
  • Store credentials in environment variables, not source code.

  • Pin CLAVUE_AGENT_MODEL or pass model in code for predictable behavior.

  • Use allowedTools or toolsets for every automated workflow.

  • Set maxTurns for bounded execution.

  • Log the final AgentRunResult metadata: status, subtype, num_turns, usage, duration_ms, and total_cost_usd.

  • Enable selfImprovement only for workflows where persisting run lessons is expected.

  • Close reusable agents with await agent.close() so sessions, MCP connections, and memory hooks flush cleanly.

  • 凭证放在环境变量中,不要写进源码。

  • 通过 CLAVUE_AGENT_MODEL 或代码里的 model 固定模型,保证行为可预测。

  • 每个自动化流程都设置 allowedToolstoolsets

  • 设置 maxTurns,避免无界运行。

  • 记录 AgentRunResult 元数据:statussubtypenum_turnsusageduration_mstotal_cost_usd

  • 只有在确实希望持久化运行经验时才开启 selfImprovement

  • 可复用 agent 使用完后调用 await agent.close(),确保 session、MCP 连接和 memory hooks 正常收尾。

Common recipes / 常用方法

# Explain a repository / 解释仓库结构
npx clavue-agent-sdk "Explain this repository architecture" --toolset repo-readonly

# Review a pull-request checkout / 审查当前 PR 工作区
npx clavue-agent-sdk "Review the current diff for bugs and release risks" --toolset repo-readonly

# Generate a machine-readable report / 生成机器可读报告
npx clavue-agent-sdk "Return JSON listing package release blockers" --toolset repo-readonly --json

1. Install as a library / 作为库安装

npm install clavue-agent-sdk

2. Configure / 配置

Set the environment variables once, then start using the SDK immediately.

先设置环境变量,然后就可以直接开始调用 SDK。

export CLAVUE_AGENT_API_KEY=your-api-key
# Optional / 可选
# export CLAVUE_AGENT_MODEL=claude-sonnet-4-6

OpenAI-compatible setup / OpenAI 兼容模型配置

export CLAVUE_AGENT_API_TYPE=openai-completions
export CLAVUE_AGENT_API_KEY=sk-...
export CLAVUE_AGENT_BASE_URL=https://api.openai.com/v1
export CLAVUE_AGENT_MODEL=gpt-4o

Anthropic-compatible gateway setup / Anthropic 兼容网关配置

export CLAVUE_AGENT_BASE_URL=https://openrouter.ai/api
export CLAVUE_AGENT_API_KEY=sk-or-...
export CLAVUE_AGENT_MODEL=anthropic/claude-sonnet-4

3. Easiest integration for another program / 其他程序最简单集成方式

If another Node.js service just needs one clear call, use run(). It creates an agent, executes the prompt, closes the agent, and returns a complete typed artifact.

如果其他 Node.js 服务只想用最简单的一次调用,使用 run()。它会创建 agent、执行 prompt、关闭 agent,并返回完整的类型化结果。

import { run } from "clavue-agent-sdk";

const result = await run({
  prompt: "Read package.json and return the name and version as JSON.",
  options: {
    cwd: process.cwd(),
    allowedTools: ["Read"],
    maxTurns: 3,
  },
});

if (result.status !== "completed") {
  throw new Error(result.errors?.join("\n") || result.subtype);
}

console.log(result.text);

run() returns AgentRunResult: status, subtype, final text, events, messages, usage, num_turns, duration_ms, duration_api_ms, total_cost_usd, timestamps, optional errors, and optional self_improvement artifacts when enabled.

run() 返回 AgentRunResult:包含 statussubtype、最终 texteventsmessagesusagenum_turns、耗时、费用、时间戳、可选 errors,以及启用时返回的可选 self_improvement 结果。

4. Streaming events / 流式事件

Use query() when your program wants live events: assistant text, tool calls, tool results, and the final result.

当你的程序需要实时事件流时使用 query():包括 assistant 文本、工具调用、工具结果和最终结果。

import { query } from "clavue-agent-sdk";

for await (const message of query({
  prompt: "Read package.json and tell me the project name.",
  options: {
    allowedTools: ["Read", "Glob"],
  },
})) {
  if (message.type === "assistant") {
    for (const block of message.message.content) {
      if ("text" in block) console.log(block.text);
    }
  }

  if (message.type === "result") {
    console.log(`Done in ${message.num_turns} turns`);
  }
}

Partial-message streaming (live text deltas) / 字符级流式(TTFT 体感)

Pass includePartialMessages: true to receive partial_message events with each text delta as the model emits it — useful for character-by-character UI rendering. The aggregated assistant event still arrives after all partials. Defaults to false so existing callers are unaffected.

includePartialMessages: true 即可拿到模型逐 token 返回的 partial_message 事件,适合做字符级 UI 渲染。最终聚合的 assistant 事件仍然在所有 partial 之后到达。默认关闭,不影响现有调用方。

import { createAgent } from "clavue-agent-sdk";

const agent = createAgent({
  model: "claude-sonnet-4-6",
  includePartialMessages: true,
});

for await (const event of agent.query("Count from 1 to 5.")) {
  if (event.type === "partial_message" && event.partial?.type === "text") {
    process.stdout.write(event.partial.text); // live deltas
  }
  if (event.type === "assistant") {
    process.stdout.write("\n"); // aggregated message arrives last
  }
}

See examples/18-streaming.ts for a runnable demo.

5. Reusable agent / 可复用 Agent

Use createAgent() when your application needs multi-turn state, session persistence, MCP connections, hooks, or repeated calls.

当你的应用需要多轮上下文、会话持久化、MCP 连接、hooks 或重复调用时,使用 createAgent()

import { createAgent } from "clavue-agent-sdk";

const agent = createAgent({ model: "claude-sonnet-4-6" });
try {
  const result = await agent.prompt("What files are in this project?");

  console.log(result.text);
  console.log(
    `Turns: ${result.num_turns}, Tokens: ${result.usage.input_tokens + result.usage.output_tokens}`,
  );
} finally {
  await agent.close();
}

6. OpenAI / GPT models

import { createAgent } from "clavue-agent-sdk";

const agent = createAgent({
  apiType: "openai-completions",
  model: "gpt-4o",
  apiKey: "sk-...",
  baseURL: "https://api.openai.com/v1",
});

const result = await agent.prompt("What files are in this project?");
console.log(result.text);

The apiType is auto-detected from model name — models containing gpt-, o1, o3, o4, deepseek, qwen, glm, grok, kimi, moonshot, gemini, mistral, gemma, yi-, etc. automatically use openai-completions.

apiType 也可以根据模型名自动推断:包含 gpt-o1o3o4deepseekqwenglmgrokkimimoonshotgeminimistralgemmayi- 等关键字时,会自动选择 openai-completions

7. Web demo / Web 演示

npm run web
# Open http://localhost:8081

Use this when you want a fast local sandbox for prompt-tool behavior and event streaming.

如果你想快速验证 prompt、tool 调用和事件流,这个本地 Web 演示是最快的入口。

More examples / 更多示例

Multi-turn conversation

import { createAgent } from "clavue-agent-sdk";

const agent = createAgent({ maxTurns: 5 });

const r1 = await agent.prompt(
  'Create a file /tmp/hello.txt with "Hello World"',
);
console.log(r1.text);

const r2 = await agent.prompt("Read back the file you just created");
console.log(r2.text);

console.log(`Session messages: ${agent.getMessages().length}`);

Custom tools (Zod schema)

import { z } from "zod";
import { query, tool, createSdkMcpServer } from "clavue-agent-sdk";

const getWeather = tool(
  "get_weather",
  "Get the temperature for a city",
  { city: z.string().describe("City name") },
  async ({ city }) => ({
    content: [{ type: "text", text: `${city}: 22°C, sunny` }],
  }),
);

const server = createSdkMcpServer({ name: "weather", tools: [getWeather] });

for await (const msg of query({
  prompt: "What is the weather in Tokyo?",
  options: { mcpServers: { weather: server } },
})) {
  if (msg.type === "result")
    console.log(`Done: $${msg.total_cost_usd?.toFixed(4)}`);
}

Custom tools (low-level)

import {
  createAgent,
  getAllBaseTools,
  defineTool,
} from "clavue-agent-sdk";

const calculator = defineTool({
  name: "Calculator",
  description: "Evaluate a math expression",
  inputSchema: {
    type: "object",
    properties: { expression: { type: "string" } },
    required: ["expression"],
  },
  isReadOnly: true,
  async call(input) {
    const result = Function(`'use strict'; return (${input.expression})`)();
    return `${input.expression} = ${result}`;
  },
});

const agent = createAgent({ tools: [...getAllBaseTools(), calculator] });
const r = await agent.prompt("Calculate 2**10 * 3");
console.log(r.text);

Skills

Skills are reusable executable workflows that extend agent capabilities. Bundled skills include coding/review helpers such as simplify, commit, review, debug, and test, plus lifecycle workflows such as define, plan, build, verify, workflow-review, ship, and repair.

import {
  createAgent,
  registerSkill,
  getAllSkills,
} from "clavue-agent-sdk";

// Register a custom skill
registerSkill({
  name: "explain",
  description: "Explain a concept in simple terms",
  userInvocable: true,
  async getPrompt(args) {
    return [
      {
        type: "text",
        text: `Explain in simple terms: ${args || "Ask what to explain."}`,
      },
    ];
  },
});

console.log(`${getAllSkills().length} skills registered`);

// The model can invoke skills via the Skill tool
const agent = createAgent();
const result = await agent.prompt('Use the "explain" skill to explain git rebase');
console.log(result.text);

Skills can also run in a forked subagent context by setting context: "fork". Forked skills create durable background AgentJobs, inherit the parent provider and permission policy, apply skill-level model and allowedTools, and preserve the subagent trace, evidence, and quality_gates on the final job record.

import {
  SkillTool,
  getAgentJob,
  registerAgents,
  registerSkill,
} from "clavue-agent-sdk";

registerAgents({
  reviewer: {
    description: "Specialized review agent",
    prompt: "Review carefully and produce concise findings.",
    tools: ["Read", "Glob", "Grep"],
  },
}, { runtimeNamespace: "docs-forked-skill" });

registerSkill({
  name: "deep-review",
  description: "Run a durable background code review",
  context: "fork",
  agent: "reviewer",
  allowedTools: ["Read", "Glob", "Grep"],
  model: "gpt-5.4",
  userInvocable: true,
  async getPrompt(args) {
    return [{ type: "text", text: `Review this target: ${args}` }];
  },
}, { runtimeNamespace: "docs-forked-skill" });

const result = await SkillTool.call(
  { skill: "deep-review", args: "src/agent.ts" },
  {
    cwd: process.cwd(),
    runtimeNamespace: "docs-forked-skill",
    model: "gpt-5.4",
    provider,
  },
);

const { job_id } = JSON.parse(String(result.content));
const job = await getAgentJob(job_id, { runtimeNamespace: "docs-forked-skill" });
console.log(job?.status, job?.trace, job?.evidence, job?.quality_gates);

Self-improvement memory

Enable selfImprovement when you want each structured run to capture reusable operational lessons for future runs. It is opt-in and stores bounded improvement memories after Agent.run() / top-level run() completes.

import { createAgent, queryMemories } from "clavue-agent-sdk";

const agent = createAgent({
  cwd: process.cwd(),
  memory: {
    enabled: true,
    autoInject: true,
    repoPath: process.cwd(),
  },
  selfImprovement: {
    memory: {
      repoPath: process.cwd(),
      maxEntriesPerRun: 4,
    },
  },
});

try {
  const run = await agent.run("Verify the package release is ready.");
  console.log(run.self_improvement?.savedMemories.length ?? 0);

  const lessons = await queryMemories({
    repoPath: process.cwd(),
    type: "improvement",
    text: "package release verification",
    limit: 5,
  });
  console.log(lessons.map((lesson) => lesson.title));
} finally {
  await agent.close();
}

By default this captures failed tool-result signals and terminal run failures. Successful run patterns are only saved when selfImprovement.memory.captureSuccessfulRuns is explicitly enabled. Captured text is trimmed, common API keys and bearer tokens are redacted, and future runs must still verify current repo state before applying a remembered lesson.

默认只捕获工具失败信号和 run 终态失败;只有显式设置 captureSuccessfulRuns 时才会记录成功模式。记录内容会裁剪并脱敏常见 API key / bearer token,未来 run 使用这些经验前仍需要验证当前仓库状态。

You can combine run learning with the deterministic retro/eval cycle, and optionally allow a bounded retry loop guarded by verification gates:

const run = await agent.run("Improve this SDK safely.", {
  selfImprovement: {
    memory: { repoPath: process.cwd() },
    retro: {
      enabled: true,
      targetName: "clavue-agent-sdk",
      gates: [
        { name: "build", command: "npm", args: ["run", "build"] },
        { name: "test", command: "npm", args: ["test"] },
      ],
      loop: {
        enabled: true,
        maxAttempts: 3,
        retryPrompt: "Fix the highest-priority verified issue, then stop.",
      },
    },
  },
});

console.log(run.self_improvement?.retroLoop?.summary.completedAttempts);
console.log(run.self_improvement?.retroCycle?.summary.statusLine);

Nested retry runs automatically disable nested selfImprovement capture to keep the loop bounded. retroCycle always points at the final cycle for compatibility; retroLoop contains every cycle and retry lineage when loop mode is enabled.

Exported helpers: extractRunImprovementCandidates(run, config, options) for dry-run extraction and runSelfImprovement(run, config, options) for direct persistence/retro orchestration.

Retro / eval core

Run a deterministic engine-level evaluation loop and get structured findings, scores, and upgrade workstreams. createDefaultRetroEvaluators() inspects package/import/build/test/onboarding readiness across the four core dimensions:

import {
  createDefaultRetroEvaluators,
  runRetroEvaluation,
} from "clavue-agent-sdk";

const evaluators = createDefaultRetroEvaluators();

const result = await runRetroEvaluation({
  target: { name: "my-project", cwd: process.cwd() },
  evaluators,
});

console.log(result.scores.overall.score);
console.log(result.proposed_workstreams);

Run the full retro cycle in one call:

import {
  createDefaultRetroEvaluators,
  runRetroCycle,
} from "clavue-agent-sdk";

const cycle = await runRetroCycle({
  target: { name: "my-project", cwd: process.cwd() },
  evaluators: createDefaultRetroEvaluators(),
  gates: [
    { name: "build", command: "npm", args: ["run", "build"] },
    { name: "test", command: "npm", args: ["test"] },
  ],
  runId: "run-current",
  previousRunId: "run-previous",
  policy: { maxAttempts: 3 },
});

console.log(cycle.run.summary);
console.log(cycle.verification?.summary);
console.log(cycle.action.kind);
console.log(cycle.decision.disposition); // accepted | rejected | retry
console.log(cycle.summary.statusLine);
console.log(cycle.summary.text);

Or use the built-in defaults with just a target:

import { runRetroCycle } from "clavue-agent-sdk";

const cycle = await runRetroCycle({
  target: { name: "my-project", cwd: process.cwd() },
});

console.log(cycle.verification?.gates.map((gate) => gate.name)); // ["build", "test"]

Persist a run for later comparison:

import {
  compareRetroRuns,
  loadRetroCycle,
  loadRetroRun,
  saveRetroCycle,
  saveRetroRun,
} from "clavue-agent-sdk";

await saveRetroRun("run-2026-04-14", result);
await saveRetroCycle("cycle-2026-04-14", cycle);
const previous = await loadRetroRun("run-2026-04-13");
const previousCycle = await loadRetroCycle("cycle-2026-04-13");

if (previous) {
  const drift = compareRetroRuns(previous, result);
  console.log(drift.scoreDeltas.overall.delta);
  console.log(drift.newFindings);
}

console.log(previousCycle?.decision.disposition);

Run fixed quality gates before or after a retro pass:

import { runRetroVerification } from "clavue-agent-sdk";

const verification = await runRetroVerification({
  target: { name: "my-project", cwd: process.cwd() },
  gates: [
    { name: "build", command: "npm", args: ["run", "build"] },
    { name: "test", command: "npm", args: ["test"] },
  ],
});

console.log(verification.passed);
console.log(verification.gates);

Decide the next machine action from retro state:

import {
  compareRetroRuns,
  decideRetroAction,
  loadRetroRun,
  runRetroEvaluation,
  runRetroVerification,
  saveRetroRun,
} from "clavue-agent-sdk";

const verification = await runRetroVerification({
  target: { name: "my-project", cwd: process.cwd() },
});

const current = await runRetroEvaluation({
  target: { name: "my-project", cwd: process.cwd() },
  evaluators,
});

const previous = await loadRetroRun("run-previous");
const comparison = previous ? compareRetroRuns(previous, current) : undefined;
const action = decideRetroAction({
  run: current,
  verification,
  previousRun: previous ?? undefined,
  comparison,
  attemptCount: 0,
  policy: { maxAttempts: 3 },
});

await saveRetroRun("run-current", current);
console.log(verification.summary);
console.log(action.kind);

Hooks (lifecycle events)

import { createAgent, createHookRegistry } from "clavue-agent-sdk";

const hooks = createHookRegistry({
  PreToolUse: [
    {
      handler: async (input) => {
        console.log(`About to use: ${input.toolName}`);
        // Return { block: true } to prevent tool execution
      },
    },
  ],
  PostToolUse: [
    {
      handler: async (input) => {
        console.log(`Tool ${input.toolName} completed`);
      },
    },
  ],
});

20 lifecycle events: PreToolUse, PostToolUse, PostToolUseFailure, SessionStart, SessionEnd, Stop, SubagentStart, SubagentStop, UserPromptSubmit, PermissionRequest, PermissionDenied, TaskCreated, TaskCompleted, ConfigChange, CwdChanged, FileChanged, Notification, PreCompact, PostCompact, TeammateIdle.

MCP server integration

import { createAgent } from "clavue-agent-sdk";

const agent = createAgent({
  mcpServers: {
    filesystem: {
      command: "npx",
      args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
    },
  },
});

const result = await agent.prompt("List files in /tmp");
console.log(result.text);
await agent.close();

Subagents

import { query } from "clavue-agent-sdk";

for await (const msg of query({
  prompt: "Use the code-reviewer agent to review src/index.ts",
  options: {
    agents: {
      "code-reviewer": {
        description: "Expert code reviewer",
        prompt: "Analyze code quality. Focus on security and performance.",
        tools: ["Read", "Glob", "Grep"],
      },
    },
  },
})) {
  if (msg.type === "result") console.log("Done");
}

Durable background AgentJobs

Use AgentTool with run_in_background: true when a subagent should continue without blocking the parent turn. The tool returns a durable job envelope immediately:

{
  "success": true,
  "type": "clavue.agent.job",
  "version": 1,
  "job_id": "agent_job_...",
  "status": "queued"
}

The job is persisted under the current runtime namespace, stores final output, trace, evidence, quality gates, errors, and heartbeat status, and can be inspected or cancelled through tools or SDK APIs.

import {
  AgentTool,
  AgentJobListTool,
  AgentJobGetTool,
  AgentJobStopTool,
  getAgentJob,
  listAgentJobs,
} from "clavue-agent-sdk";

const context = {
  cwd: process.cwd(),
  runtimeNamespace: "docs-background-demo",
  model: "gpt-5.4",
  provider,
};

const started = await AgentTool.call({
  prompt: "Review src/ for security risks.",
  description: "security review",
  subagent_type: "Explore",
  run_in_background: true,
}, context);

const { job_id } = JSON.parse(String(started.content));
console.log(await listAgentJobs({ runtimeNamespace: context.runtimeNamespace }));
console.log(await getAgentJob(job_id, { runtimeNamespace: context.runtimeNamespace }));

await AgentJobListTool.call({}, context);
await AgentJobGetTool.call({ id: job_id }, context);
await AgentJobStopTool.call({ id: job_id, reason: "no longer needed" }, context);

Exported helpers include createAgentJob(), getAgentJob(), listAgentJobs(), stopAgentJob(), clearAgentJobs(), and the public types AgentJobRecord, AgentJobStatus, AgentJobKind, AgentJobCompletion, AgentJobStoreOptions, and CreateAgentJobInput.

AgentJob storage defaults to ~/.clavue-agent-sdk/agent-jobs; set CLAVUE_AGENT_JOBS_DIR or pass AgentJobStoreOptions.dir to isolate stores in tests or multi-tenant hosts.

Permissions and tool execution safety

import { query } from "clavue-agent-sdk";

// Trusted automation is the default; restrict tools for a read-only agent.
for await (const msg of query({
  prompt: "Review the code in src/ for best practices.",
  options: {
    toolsets: ["repo-readonly"],
    disallowedTools: ["WebSearch"],
    canUseTool: async (tool, input) => {
      if (tool.name === "Read") return { behavior: "allow" };
      return { behavior: "allow", updatedInput: input };
    },
  },
})) {
  // ...
}

Tool access is controlled in layers: toolsets and allowedTools choose the available tool names, disallowedTools removes names last, canUseTool can deny or rewrite a specific tool input, and hooks can block lifecycle events. Subagents inherit the parent permission policy.

工具访问按层控制:toolsetsallowedTools 选择可用工具名,disallowedTools 最后移除工具名,canUseTool 可以拒绝或改写单次工具输入,hooks 可以拦截生命周期事件。Subagent 会继承父 agent 的权限策略。

permissionMode also has built-in semantics. default allows read-only tools only. plan freezes mutating tools while allowing planning/read tools. acceptEdits allows local file edits but blocks shell, network, external-state, destructive, or approval-required tools. trustedAutomation and bypassPermissions are high-trust modes; still use allowedTools, disallowedTools, and canUseTool for least privilege.

permissionMode 也有内置语义。default 只允许只读工具。plan 会冻结修改型工具,同时允许规划和读取工具。acceptEdits 允许本地文件编辑,但会阻止 shell、网络、外部状态、破坏性或需要审批的工具。trustedAutomationbypassPermissions 是高信任模式;生产环境仍建议配合 allowedToolsdisallowedToolscanUseTool 做最小权限控制。

Low-confirmation development mode

Use autonomyMode: "autonomous" when the user has already authorized a development task and wants the agent to inspect, edit, verify, and repair without routine confirmation prompts. This changes initiative and question-asking behavior only; it does not bypass permissionMode, tool filters, hooks, or host canUseTool.

import { run } from "clavue-agent-sdk";

const result = await run({
  prompt: "Resolve the P0-P3 todo list, fix failures, and run verification.",
  options: {
    cwd: process.cwd(),
    model: "gpt-5.5",
    toolsets: ["repo-edit"],
    allowedTools: ["Bash"],
    permissionMode: "trustedAutomation",
    autonomyMode: "autonomous",
    maxTurns: 16,
  },
});

console.log(result.trace?.policy_decisions);

CLI equivalent:

CLAVUE_AGENT_AUTONOMY=autonomous \
CLAVUE_AGENT_PERMISSION_MODE=trustedAutomation \
npx clavue-agent-sdk "Fix the P0-P3 todo list and verify" \
  --toolset repo-edit \
  --allow Bash \
  --json

For safer local-edit-only automation, combine autonomyMode: "autonomous" with permissionMode: "acceptEdits" and omit shell/network tools. Run traces include policy_decisions for both allows and denials, with a safe input summary instead of raw tool input, plus the backward-compatible permission_denials list.

Local issue workflows

Use the issue workflow when you want a bounded builder, reviewer, fixer, and verifier loop around a concrete bug report or todo item. issue run creates the workflow record and background jobs without executing the full loop. issue execute runs the local workflow loop immediately.

# Create a workflow from inline text / 从内联文本创建 workflow
npx clavue-agent-sdk issue run "Fix provider retry handling for 429 responses" \
  --passing-score 85 \
  --require-gate tests \
  --json

# Execute from a local markdown issue / 从本地 markdown issue 执行
npx clavue-agent-sdk issue execute .clavue/issues/p0-provider-retry.md \
  --max-iterations 3 \
  --passing-score 90 \
  --require-gate build,tests \
  --json

# Inspect and stop workflow runs / 查看和停止 workflow run
npx clavue-agent-sdk issue list --json
npx clavue-agent-sdk issue get issue_run_... --json
npx clavue-agent-sdk issue stop issue_run_... --json

Programmatic usage:

import { normalizeIssueInput, runIssueWorkflow } from "clavue-agent-sdk";

const workflow = await runIssueWorkflow({
  cwd: process.cwd(),
  issue: normalizeIssueInput("Fix flaky package payload verification."),
  maxIterations: 3,
  passingScore: 90,
  requiredGates: ["build", "tests"],
});

console.log(workflow.status, workflow.finalScore);
console.log(workflow.proof_of_work.status, workflow.proof_of_work.verification);

Issue workflow records are stored under ~/.clavue-agent-sdk/issue-runs by default. Use the SDK store options to isolate runs for tests, CI, or multi-tenant hosts. runIssueWorkflow() returns proof_of_work, so hosts get a standard handoff artifact without the SDK owning GitHub, PR, CI, Linear, or Jira integrations.

Workflow contracts, proof of work, and orchestration policy

For host applications that want Symphony-style discipline without coupling the SDK to a tracker or daemon, use the SDK-core workflow primitives:

import {
  createProofOfWork,
  loadWorkflowDefinition,
  renderWorkflowPrompt,
  resolveWorkflowServiceConfig,
  selectDispatchCandidates,
  validateWorkflowDispatchConfig,
} from "clavue-agent-sdk";

const definition = await loadWorkflowDefinition({ cwd: repoPath });
const config = resolveWorkflowServiceConfig(definition);
const configIssues = validateWorkflowDispatchConfig(config, { requireTracker: false });
if (configIssues.length > 0) throw new Error(configIssues[0]!.message);

const selection = selectDispatchCandidates({
  config,
  issues: [{
    id: "issue-42",
    identifier: "SDK-42",
    title: "Fix autonomous workflow handoff",
    state: "Todo",
    priority: 1,
  }],
});

const prompt = renderWorkflowPrompt(definition, {
  issue: {
    identifier: selection.selected[0]?.identifier,
    title: selection.selected[0]?.title,
    description: "Produce a tested SDK-core implementation and proof of work.",
  },
});

const proof = createProofOfWork({
  target: { kind: "issue", id: "SDK-42", title: "Fix autonomous workflow handoff" },
  evidence: [{ type: "test", summary: "Focused verification passed", source: "external" }],
  quality_gates: [{ name: "tests", status: "passed" }],
  required_gates: ["tests"],
  references: [{ type: "issue", label: "Host issue", url: "https://tracker.example/SDK-42" }],
});

console.log(prompt);
console.log(proof.status, proof.handoff);

The SDK standardizes the contract, proof, and policy layers. Your host application still owns task polling, external tracker updates, PR creation, CI execution, dashboards, and worker lifecycle.

Runtime profiles

Runtime profiles turn a high-level workflow mode into concrete toolsets, permission mode, memory policy, autonomy mode, prompt guidance, and quality-gate behavior. This is the recommended path for hosts that want consistent behavior across collect, organize, plan, solve, build, verify, review, and ship flows.

import { getAllRuntimeProfiles, run } from "clavue-agent-sdk";

console.log(getAllRuntimeProfiles().map((profile) => profile.name));

const result = await run({
  prompt: "Verify this package is ready to publish.",
  options: {
    workflowMode: "verify",
    cwd: process.cwd(),
    maxTurns: 6,
  },
});

console.log(result.status, result.trace?.policy_decisions);

The engine only parallelizes tool calls when a tool declares both isReadOnly() and isConcurrencySafe(). Mutating tools and read-only tools that are not concurrency-safe run serially. Set maxToolConcurrency per run to cap safe parallel batches; when omitted, AGENT_SDK_MAX_TOOL_CONCURRENCY is used as the fallback. Invalid, zero, or negative values fall back to 10 so runs do not hang. Run traces include tool_concurrency_limit, tool_concurrency_source, and the existing concurrency_batches.

引擎只会并行执行同时声明 isReadOnly()isConcurrencySafe() 的工具调用。会修改状态的工具,以及只读但非并发安全的工具,会串行执行。可通过每次运行的 maxToolConcurrency 限制安全并行批次;未设置时回退使用 AGENT_SDK_MAX_TOOL_CONCURRENCY。无效、零或负数会回退到 10,避免运行卡住。运行 trace 会包含 tool_concurrency_limittool_concurrency_source 和已有的 concurrency_batches

Tool result memoization (turn-scoped)

When a turn dispatches multiple tool_use blocks for the same read-only concurrency-safe tool with the same input — including duplicates within a single concurrent batch — the engine reuses one tool.call() instead of running it once per block. Permission checks, PreToolUse / PostToolUse hooks, and tool_input / tool_output guardrails still run on every block; only the tool's own work is elided. The cache resets between turns, so model state never sees a stale read. is_error: true results are never retained. Aggregate counters land in trace.tool_cache = { hits, misses }.

每个 turn 内对同一个只读且并发安全工具、相同输入的重复 tool_use 块(含同一并发批次内的重复请求)只会执行一次 tool.call(),其它块直接复用已有结果。权限检查、PreToolUse / PostToolUse hooks 和 tool_input / tool_output guardrails 仍按块执行;只是跳过工具本体的副作用。每个 turn 结束后缓存清空,确保模型状态不会读到陈旧数据。is_error: true 结果不会被缓存。聚合计数会写到 trace.tool_cache = { hits, misses }

OpenAI prompt prefix caching

Both Chat Completions and Responses requests now include a stable prompt_cache_key derived from (model, system prompt, tool schema). When the prefix is unchanged across turns, OpenAI's prompt cache reuses the cached prefix and bills only the suffix tokens. The key intentionally excludes the conversation input, since that grows every turn. Gateways that don't recognize the field simply ignore it. Anthropic's cache_control: ephemeral markers continue to be applied automatically in the Anthropic provider.

每个 OpenAI 请求(Chat Completions 与 Responses 同时支持)会附带一个稳定的 prompt_cache_key,由 (model, system prompt, tool schema) 派生。当前缀稳定时 OpenAI 仅按差异部分计费。该 key 不包含会话 input,避免随每个 turn 变动。不识别此字段的网关会自动忽略。Anthropic 端继续自动添加 cache_control: ephemeral 标记。

Provider retries and tolerance

Provider calls automatically retry transient API and network failures with exponential backoff. Retryable conditions include rate limits, common 5xx/overload statuses, fetch/socket failures, and Retry-After headers; abort signals are honored during backoff.

Provider 调用会对临时 API 和网络失败自动指数退避重试。可重试场景包括限流、常见 5xx/overload 状态、fetch/socket 失败以及 Retry-After 响应头;退避等待期间会响应 abort signal。

For OpenAI-compatible GPT-5 models, the SDK uses the Responses API by default and falls back to Chat Completions when a gateway does not support /responses. Incomplete Responses output caused by output-token limits maps to max_tokens so the engine can continue; failed or cancelled Responses runs surface as errors instead of empty text.

对于 OpenAI 兼容的 GPT-5 模型,SDK 默认使用 Responses API;如果网关不支持 /responses,会回退到 Chat Completions。因输出 token 限制导致的 incomplete Responses 会映射为 max_tokens,方便引擎继续;failed 或 cancelled 的 Responses 会以错误暴露,而不是返回空文本。

Web UI

A built-in web chat interface is included for testing:

npx tsx examples/web/server.ts
# Open http://localhost:8081

API reference

Which API should I use? / 应该使用哪个 API?

Need / 需求Use / 使用
Terminal or CI one-off task / 终端或 CI 一次性任务npx clavue-agent-sdk "prompt"
Simplest Node.js integration / 最简单 Node.js 集成run({ prompt, options })
Streaming UI or progress logs / 流式 UI 或进度日志query({ prompt, options })
Multi-turn service, sessions, MCP, hooks / 多轮服务、会话、MCP、hookscreateAgent(options)

Program logic / 程序逻辑

  • Your app calls run(), query(), or a reusable agent.prompt() / agent.query().
  • The SDK builds the system context from options, repo context files, git status, tools, MCP servers, skills, hooks, and permission policy.
  • The provider layer sends normalized messages and tool schemas to Anthropic Messages or an OpenAI-compatible chat endpoint.
  • When the model requests a tool, the engine applies allow/deny filters, canUseTool, permission mode, and hooks, then executes the tool.
  • Tool results are appended to the conversation and the engine repeats until the provider returns a final answer or the run reaches limits.
  • The SDK returns either streaming SDKMessage events or a structured AgentRunResult artifact, reusable agents can persist sessions under ~/.clavue-agent-sdk, and background AgentJobs persist under ~/.clavue-agent-sdk/agent-jobs.

Top-level functions

FunctionDescription
run({ prompt, options })One-shot blocking run, returns Promise<AgentRunResult>
query({ prompt, options })One-shot streaming query, returns AsyncGenerator<SDKMessage>
createAgent(options)Create a reusable agent with session persistence
tool(name, desc, schema, handler)Create a tool with Zod schema validation
createSdkMcpServer({ name, tools })Bundle tools into an in-process MCP server
defineTool(config)Low-level tool definition helper
doctor(options)Run structured provider, tool, skill, MCP, storage, and package checks
runBenchmarks(options)Run offline benchmark metrics without live model calls
getAllBaseTools()Get all 35+ built-in tools
registerSkill(definition)Register a custom skill
getAllSkills()Get all registered skills
createAgentJob(input, opts)Create a durable background agent job record
getAgentJob(id, opts)Read a durable background job by ID
listAgentJobs(opts)List durable background jobs in a runtime namespace
stopAgentJob(id, reason, opts)Cancel a queued or running background job
clearAgentJobs(opts)Clear background jobs for a runtime namespace
runSelfImprovement(run, config, opts)Persist bounded improvement memories and optionally run retro/eval feedback
extractRunImprovementCandidates(run, config, opts)Inspect which improvement memories a run would generate
runRetroEvaluation(input)Run deterministic retro/eval orchestration and return typed results
createDefaultRetroEvaluators()Inspect package/import/build/test/onboarding readiness across the core dimensions
compareRetroRuns(previous, current)Compare two retro runs for score deltas and finding drift
decideRetroAction(input)Decide the next machine action from current retro state
runRetroVerification(input)Run fixed quality gates and return pass/fail command results
runRetroCycle(input)Run evaluation, verification, policy, comparison, and optional persistence in one call
saveRetroRun(runId, result, opts)Persist a retro run result to the run ledger
loadRetroRun(runId, opts)Load a persisted retro run result from the run ledger
saveRetroCycle(cycleId, result, opts)Persist a full retro cycle result including decision and summary
loadRetroCycle(cycleId, opts)Load a persisted retro cycle result from the run ledger
normalizeIssueInput(input, source?)Normalize inline or file-backed issue text into a workflow record
createIssueWorkflowRun(input, opts)Create a durable local issue workflow with role-based AgentJobs
runIssueWorkflow(input, opts)Execute a bounded local builder/reviewer/fixer/verifier loop and return proof_of_work
listIssueWorkflowRuns(opts)List persisted issue workflow runs
loadIssueWorkflowRun(id, opts)Load one persisted issue workflow run
stopIssueWorkflowRun(id, reason, opts)Stop an issue workflow run and cancel its associated jobs
loadWorkflowDefinition(opts)Load a repository-owned WORKFLOW.md contract
renderWorkflowPrompt(def, input)Strictly render an issue/task prompt from a workflow contract
resolveWorkflowServiceConfig(def)Resolve workflow defaults, env indirection, workspaces, and runtime settings
validateWorkflowDispatchConfig(config)Validate workflow config before dispatch
selectDispatchCandidates(input)Select eligible issues under active/terminal state and concurrency policy
calculateRetryDelayMs(input)Compute continuation or capped exponential retry delay
shouldReleaseIssueForState(state, config)Decide whether an issue state should release a claim
createProofOfWork(input)Create a standard proof-of-work handoff artifact
getRuntimeProfile(mode)Read a built-in workflow profile
getAllRuntimeProfiles()List built-in workflow profiles
applyRuntimeProfile(options)Expand workflowMode into concrete runtime options
normalizeFindings(findings)Normalize retro findings into a stable schema
scoreFindings(findings)Compute per-dimension and overall retro scores
planUpgrades(findings)Turn retro findings into prioritized workstreams
createProvider(apiType, opts)Create an LLM provider directly
createHookRegistry(config)Create a hook registry for lifecycle events
listSessions()List persisted sessions
forkSession(id)Fork a session for branching

Agent methods

MethodDescription
agent.query(prompt)Streaming query, returns AsyncGenerator<SDKMessage>
agent.run(text, overrides)Blocking run, returns full AgentRunResult including self_improvement when enabled
agent.prompt(text)Blocking query, returns Promise<QueryResult>
agent.getMessages()Get conversation history
agent.clear()Reset session
agent.interrupt()Abort current query
agent.setModel(model)Change model mid-session
agent.setPermissionMode(mode)Change permission mode
agent.stopTask(id)Stop a durable AgentJob by ID, then fall back to legacy task cancellation
agent.getApiType()Get current API type
agent.close()Close MCP connections, persist session

Options

OptionTypeDefaultDescription
apiTypestringauto-detected'anthropic-messages' or 'openai-completions'
modelstringclaude-sonnet-4-6LLM model ID
apiKeystringCLAVUE_AGENT_API_KEYAPI key
baseURLstringCustom API endpoint
cwdstringprocess.cwd()Working directory
systemPromptstringSystem prompt override
appendSystemPromptstringAppend to default system prompt
toolsToolDefinition[]All built-inAvailable tools
toolsetsToolsetName[]Named built-in tool groups
allowedToolsstring[]Tool allow-list
disallowedToolsstring[]Tool deny-list
permissionModestringtrustedAutomationtrustedAutomation / auto / default / acceptEdits / dontAsk / bypassPermissions / plan
autonomyModestringinferred from permission/profilesupervised / proactive / autonomous; controls initiative and confirmations without bypassing permissions
canUseToolfunctionallow allCustom tool guard or input modifier
qualityGatePolicyQualityGatePolicyMark a successful run as failed when required quality gates fail or are missing
maxTurnsnumber10Max agentic turns
maxToolConcurrencynumberenv or 10Max concurrent read-only concurrency-safe tool calls per batch
maxBudgetUsdnumberSpending cap
thinkingThinkingConfig{ type: 'adaptive' }Extended thinking
effortstringhighReasoning effort: low / medium / high / max
mcpServersRecord<string, McpServerConfig>MCP server connections
agentsRecord<string, AgentDefinition>Subagent definitions
hooksRecord<string, HookCallbackMatcher[]>Lifecycle hooks
memoryMemoryConfigStructured memory injection, off / autoInject / brainFirst policy, and session-summary persistence
selfImprovementboolean | SelfImprovementConfigfalseOpt-in run learning via improvement memories and optional retro cycle
resumestringResume session by ID
continuebooleanfalseContinue most recent session
persistSessionbooleantruePersist session to disk
sessionIdstringautoExplicit session ID
outputFormat{ type: 'json_schema', schema }Structured output
sandboxSandboxSettingsFilesystem/network sandbox
settingSourcesSettingSource[]Load AGENT.md, project settings
envRecord<string, string>Environment variables
abortControllerAbortControllerCancellation controller

Named toolsets

Use toolsets in the SDK or --toolset in the CLI to enable named groups of built-in tools without listing every tool name. The SDK also exports TOOLSET_NAMES, isToolsetName(), and getToolsetTools() for validation and UI generation.

在 SDK 中使用 toolsets,或在 CLI 中使用 --toolset,可以启用命名的内置工具组,而不必逐个列出工具名。SDK 也导出 TOOLSET_NAMESisToolsetName()getToolsetTools(),方便做校验或生成 UI。

import { TOOLSET_NAMES, getToolsetTools, isToolsetName, run } from "clavue-agent-sdk";

const selected = "repo-readonly";
if (!isToolsetName(selected)) throw new Error("Unknown toolset");

const result = await run({
  prompt: "Review this repository and check current docs.",
  options: {
    toolsets: [selected, "research"],
    disallowedTools: ["WebSearch"],
  },
});

console.log(TOOLSET_NAMES);
console.log(getToolsetTools([selected]));
ToolsetTools
repo-readonlyRead, Glob, Grep
repo-editRead, Write, Edit, Glob, Grep, NotebookEdit
researchWebFetch, WebSearch
planningEnterPlanMode, ExitPlanMode, AskUserQuestion, TodoWrite
tasksTaskCreate, TaskList, TaskUpdate, TaskGet, TaskStop, TaskOutput
automationCronCreate, CronDelete, CronList, RemoteTrigger
agentsAgent, AgentJobList, AgentJobGet, AgentJobStop, SendMessage, TeamCreate, TeamDelete
mcpListMcpResources, ReadMcpResource
skillsSkill

toolsets are merged with allowedTools; disallowedTools is applied last and can remove tools from either source. For example, toolsets: ["repo-readonly"] plus allowedTools: ["WebFetch"] enables Read, Glob, Grep, and WebFetch; adding disallowedTools: ["Grep"] removes Grep.

toolsets 会与 allowedTools 合并;disallowedTools 最后应用,可以从任一来源移除工具。例如,toolsets: ["repo-readonly"]allowedTools: ["WebFetch"] 会启用 ReadGlobGrepWebFetch;再加 disallowedTools: ["Grep"] 会移除 Grep

Environment variables

VariableDescription
CLAVUE_AGENT_API_KEYAPI key (required)
CLAVUE_AGENT_API_TYPEanthropic-messages (default) or openai-completions
CLAVUE_AGENT_MODELDefault model override
CLAVUE_AGENT_BASE_URLCustom API endpoint
CLAVUE_AGENT_AUTH_TOKENAlternative auth token
CLAVUE_AGENT_JOBS_DIROverride durable AgentJob storage directory
AGENT_SDK_MAX_TOOL_CONCURRENCYMax concurrent batch size for tools that are both read-only and concurrency-safe; invalid values fall back to 10

Built-in tools

Filesystem tools resolve paths relative to cwd but may access absolute paths when the host exposes them. For least privilege, combine cwd, toolsets, allowedTools/disallowedTools, canUseTool, and sandbox settings at the application boundary.

文件系统工具会相对 cwd 解析路径,但当宿主环境暴露绝对路径时也可能访问绝对路径。最小权限部署时,请在应用边界组合使用 cwdtoolsetsallowedTools/disallowedToolscanUseToolsandbox 设置。

Session IDs are validated before disk access so persisted transcripts cannot escape the configured session store via absolute paths, .., or null-byte input. For multi-tenant hosts, also isolate session.dir, CLAVUE_AGENT_JOBS_DIR, and runtimeNamespace per tenant.

Session ID 在访问磁盘前会进行校验,持久化 transcript 不能通过绝对路径、.. 或空字节输入逃逸配置的 session store。多租户宿主还应为每个租户隔离 session.dirCLAVUE_AGENT_JOBS_DIRruntimeNamespace

ToolDescription
BashExecute shell commands
ReadRead files with line numbers
WriteCreate / overwrite files
EditPrecise string replacement in files
GlobFind files by pattern
GrepSearch file contents with regex
WebFetchFetch and parse web content
WebSearchSearch the web
NotebookEditEdit Jupyter notebook cells
AgentSpawn subagents for parallel work
AgentJobList/Get/StopInspect and cancel durable background AgentJobs
SkillInvoke registered skills
TaskCreate/List/Update/Get/Stop/OutputTask management system
TeamCreate/DeleteMulti-agent team coordination
SendMessageInter-agent messaging
EnterWorktree/ExitWorktreeGit worktree isolation
EnterPlanMode/ExitPlanModeStructured planning workflow
AskUserQuestionAsk the user for input
ToolSearchDiscover lazy-loaded tools
ListMcpResources/ReadMcpResourceMCP resource access
CronCreate/Delete/ListScheduled task management
RemoteTriggerRemote agent triggers
LSPLanguage Server Protocol (code intelligence)
ConfigDynamic configuration
TodoWriteSession todo list

Bundled skills

SkillDescription
simplifyReview changed code for reuse, quality, and efficiency
commitCreate a git commit with a well-crafted message
reviewReview code changes for correctness, security, and performance
debugSystematic debugging using structured investigation
testRun tests and analyze failures
defineDefine goals, constraints, assumptions, and acceptance criteria
planProduce an ordered implementation plan and verification strategy
buildImplement scoped changes while preserving local patterns
verifyRun targeted checks and report evidence
workflow-reviewReview lifecycle work for defects, risks, and missing evidence
shipPrepare a handoff or release summary with verification status
repairDiagnose and fix failed workflow outcomes with recovery evidence

Register custom skills with registerSkill().

Architecture

┌──────────────────────────────────────────────────────┐
│                   Your Application                    │
│                                                       │
│   import { createAgent } from 'clavue-agent-sdk' │
└────────────────────────┬─────────────────────────────┘
                         │
              ┌──────────▼──────────┐
              │       Agent         │  Session state, tool pool,
              │  query() / prompt() │  MCP connections, hooks
              └──────────┬──────────┘
                         │
              ┌──────────▼──────────┐
              │    QueryEngine      │  Agentic loop:
              │   submitMessage()   │  API call → tools → repeat
              └──────────┬──────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
   ┌─────▼─────┐  ┌─────▼─────┐  ┌─────▼─────┐
   │  Provider  │  │  35 Tools │  │    MCP     │
   │ Anthropic  │  │ Bash,Read │  │  Servers   │
   │  OpenAI    │  │ Edit,...  │  │ stdio/SSE/ │
   │ DeepSeek   │  │ + Skills  │  │ HTTP/SDK   │
   └───────────┘  └───────────┘  └───────────┘

Key internals:

ComponentDescription
Provider layerAbstracts Anthropic / OpenAI API differences
QueryEngineCore agentic loop with auto-compact, retry, safe tool orchestration
Skill systemReusable executable workflows with bundled coding, review, test, and lifecycle skills
Hook system20 lifecycle events integrated into the engine
Auto-compactSummarizes conversation when context window fills up
Micro-compactTruncates oversized tool results
RetryExponential backoff for rate limits, transient errors, and Retry-After responses
Fallback chainfallbackModel: string | string[] — multi-provider chain tried in order on retryable errors (e.g. primary GPT → Claude → GLM)
Token estimationRough token counting with pricing for Claude, GPT, DeepSeek models
File cacheLRU cache (100 entries, 25 MB) for file reads
Tool result cacheTurn-scoped memoization: duplicate tool_use blocks for the same read-only concurrency-safe tool share one tool.call()
Adaptive concurrencyOpt-in AIMD controller (adaptiveToolConcurrency): halves the concurrent chunk size after a batch with any tool error, +1 after a clean batch, bounded by [min, max]
OpenAI prefix cacheStable prompt_cache_key over (model, system prompt, tool schema) for both Responses and Chat Completions
Session storagePersist / resume / fork sessions on disk
Session list cacheIn-memory cache for listSessions() keyed by absolute dir; invalidated on saveSession / deleteSession. Escape hatch: invalidateSessionCache(dir?)
AgentJob storageDurable background subagent records with output, trace, evidence, quality gates, cancellation, and stale-heartbeat detection
AgentJob list cacheIn-memory cache for listAgentJobs() / summarizeAgentJobs() keyed by namespace dir; invalidated on create / update / stop / clear. Escape hatch: invalidateAgentJobsCache(dir?)
IssueWorkflow list cacheIn-memory cache for listIssueWorkflowRuns() keyed by namespace dir; invalidated on every writeIssueWorkflowRun (create / stop / update). Escape hatch: invalidateIssueWorkflowRunsCache(dir?)
Structured memoryQueryable user/project/reference/feedback/decision/improvement entries
Memory list cacheIn-memory cache for listMemories() keyed by absolute dir; invalidated on saveMemory / deleteMemory. Escape hatch: invalidateMemoryCache(dir?)
Memory consolidationconsolidateMemories() / findDuplicateMemories() — merge near-duplicate entries by (type, scope, title, repoPath, sessionId) identity, optional embedder cosine guard, dry-run preview
Self-improvementOpt-in run learning from failures plus optional retro verification
Context injectionGit status + AGENT.md automatically injected into system prompt

Automation roadmap

  • Telemetry and evals: turn run artifacts into scoreable, replayable traces without exposing secrets.
  • Policy learning: promote repeated safe corrections into durable tool and permission policies.
  • Tool success modeling: track which tools, arguments, and gates succeed for similar repository states.
  • Autonomous retry loops: feed failures into bounded repair attempts guarded by verification gates.
  • Cross-project memory governance: keep global lessons useful while requiring repo-state verification before reuse.

Examples

#FileDescription
01examples/01-simple-query.tsStreaming query with event handling
02examples/02-multi-tool.tsMulti-tool orchestration (Glob + Bash)
03examples/03-multi-turn.tsMulti-turn session persistence
04examples/04-prompt-api.tsBlocking prompt() API
05examples/05-custom-system-prompt.tsCustom system prompt
06examples/06-mcp-server.tsMCP server integration
07examples/07-custom-tools.tsCustom tools with defineTool()
08examples/08-official-api-compat.tsquery() API pattern
09examples/09-subagents.tsSubagent delegation
10examples/10-permissions.tsRead-only agent with named toolsets
11examples/11-custom-mcp-tools.tstool() + createSdkMcpServer()
12examples/12-skills.tsSkill system usage
13examples/13-hooks.tsLifecycle hooks
14examples/14-openai-compat.tsOpenAI / DeepSeek models
15examples/15-self-improvement.tsOpt-in run learning and improvement memories
16examples/16-background-agent-jobs.tsDurable background AgentJob APIs
32examples/32-tier-a-performance.tsOffline demo: tool cache, adaptive concurrency, fallback chain, memory consolidation
33examples/33-im-archiver.tsDesktop IM archiver — clipboard + screenshot + vision classification (WeChat / Feishu / QQ / DingTalk)
34examples/34-desktop-tools.tsclavue-agent-sdk/desktop subpath demo — clipboard, screencap, file search, open folders, email draft, notify
webexamples/web/Web chat UI for testing

Run the offline smoke-tested example:

npm run test:all

Run any live provider example:

npx tsx examples/01-simple-query.ts

Start the web UI:

npx tsx examples/web/server.ts

GitHub, npm, and deployment / GitHub、npm 与部署

Public package summary / 公开包介绍

Clavue Agent SDK is an in-process TypeScript agent SDK for teams that need controlled autonomy instead of a black-box CLI wrapper. It provides a reusable engine for coding, review, research, planning, issue repair, CI automation, and service-side agent workflows. The runtime combines strong model initiative with explicit safety controls: named toolsets, permission modes, autonomy modes, hooks, workspace guards, schema-versioned events, durable AgentJobs, workflow contracts, proof-of-work artifacts, orchestration policy helpers, memory, retro/eval, and quality gates.

For GitHub readers, start with the npx examples or the run() API. For npm consumers, install clavue-agent-sdk when you want the same agent loop embedded inside your own Node.js process rather than launching a separate tool.

Clavue Agent SDK 是进程内 TypeScript agent SDK,面向需要“受控自主”而不是黑盒 CLI 包装的团队。它可用于 coding、review、research、planning、issue repair、CI automation 与服务端 agent workflow,并通过命名工具集、权限模式、自主模式、hooks、workspace guard、schema-versioned events、durable AgentJobs、workflow contract、proof-of-work artifact、orchestration policy helper、memory、retro/eval 与 quality gates,把强模型主动性约束在可审计、可控制的边界内。

For complete application integration patterns, read Programmatic Integration Guide.

Repository checklist / 仓库检查

  • Keep README.md, package.json, and examples aligned when changing public APIs.
  • Run npm run build and npm test before publishing or cutting a release.
  • Use npm pack --dry-run --json to inspect the exact package payload.
  • The package publishes dist/ and docs/; prepack runs npm run build so TypeScript output is generated before packing.

Publish to npm / 发布到 npm

npm run build
npm test
npm pack --dry-run --json
npm publish --access public

The published package exposes two binaries:

npx clavue-agent-sdk "Summarize this repo"
npx clavue-agent "Summarize this repo"

Deploy inside another service / 在其他服务中部署

For a server, worker, CI job, Docker image, or serverless function, install clavue-agent-sdk, provide CLAVUE_AGENT_API_KEY, restrict tools with toolsets or allowedTools when the agent only needs limited access, and call run() for single-shot jobs or createAgent() for long-lived sessions.

对于服务端、worker、CI、Docker 或 Serverless:安装 clavue-agent-sdk,提供 CLAVUE_AGENT_API_KEY;如果 agent 只需要有限能力,用 toolsetsallowedTools 限制工具;一次性任务用 run(),长会话用 createAgent()

import { run } from "clavue-agent-sdk";

export async function handleRepositorySummary(repoPath: string) {
  const result = await run({
    prompt: "Summarize this repository for onboarding.",
    options: {
      cwd: repoPath,
      toolsets: ["repo-readonly"],
      maxTurns: 5,
    },
  });

  return {
    ok: result.status === "completed",
    text: result.text,
    usage: result.usage,
  };
}

Star History

Star History Chart

License

MIT

Keywords

clavue-agent-sdk

FAQs

Package last updated on 29 May 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts