Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

spec-first

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

Potential malware was recently detected in this package.

Affected versions:

1.5.5 1.5.6 1.5.7 1.5.8

spec-first

AI coding workflow CLI for spec-driven engineering, spec-driven development, and harness engineering on Claude Code and Codex

Source

npm

Version: 1.5.8

Version published: 2 months ago

Weekly downloads: 160

Maintainers: 1

Weekly downloads

Created: 3 months ago

Source

English | 简体中文

Spec-First

A workflow CLI that feeds LLM structured, provenance-backed context at every stage of the AI coding delivery loop — and governs the full path from ideation to compound learning.

Open-source for Claude Code and Codex. Install once, govern the full delivery loop.

Quick Start • Workflow • CLI • Languages • User Manual (zh) • npm

Why Spec-First

Most AI coding failures come from degraded LLM decision inputs, not weak models:

Problem	How spec-first addresses it	Enforcement
LLM starts from a blank-slate codebase context	`graph-bootstrap` extracts AST facts and compiles `minimal-context` with `provenance` and `confidence` signals	Code-hard gate at bootstrap / `stage0-context` runtime
Requirements are never made explicit	Brainstorm stage produces a requirements artifact consumed by Plan	`SKILL.md` contract
Plans drift from implementation	Plan artifact is a first-class Work input, and Review Stage 2b cross-checks the Requirements Trace against the diff	`SKILL.md` contract
Reviews are unstructured	17 reviewer personas (always-on + cross-cutting + stack-specific) plus 2 CE-specific agents, routed by `safe_auto / gated_auto / manual / advisory`	`SKILL.md` contract
Solved problems are not reused	Compound writes structured learnings to `docs/solutions/` with YAML frontmatter for future retrieval	`SKILL.md` contract

Suited for:

Teams moving from prompt-driven coding to governed AI engineering workflows
Claude Code and Codex users who want one repeatable delivery system across both hosts
Projects that need explicit specs, structured reviews, and reusable post-task learnings

Not suited for:

Teams without Claude Code or Codex
Contexts expecting zero-configuration, fully automatic code generation
Delivery loops too short to justify multi-stage workflow overhead

Design Philosophy

Light contract · Explicit boundaries · Let the LLM decide.

Spec-First rests on a single conviction: AI coding quality is bounded by the quality of decision inputs the LLM receives, not by the weight of orchestration. Repository governance (see CLAUDE.md / AGENTS.md) explicitly forbids:

Hard-coded state machines that replace LLM judgment with multi-state transitions
Over-engineered gates that fuse unrelated signals into a single orchestration object
Expanding contracts by coupling instead of by clarity

And explicitly prefers:

Surfacing provenance, freshness, confidence, fallback_reason, and verification_gaps as independent, composable input facts
Raising input quality first; reaching for more automation only when evidence demands it
Keeping control-plane boundaries clean — repo profile, diff recommendation, verifier dispatch, gate state, workflow prose, telemetry each answer one question and do not encroach on others

Every other choice in this README is a consequence of that stance.

How It Works

Spec-First upgrades what the LLM receives as decision input — it does not replace LLM judgment with a state machine.

Two Complementary Parts

graph-bootstrap — the foundation

Codebase → AST graph → fact extraction → minimal-context (provenance + confidence)
        → injection-index (stage-aware routing) → workflow input

graph-bootstrap turns a codebase into structured context before AI starts coding. It typically runs once per project, or incrementally when code changes, and its output is consumed by every downstream workflow stage.

Main workflow — the delivery loop

Ideate → Brainstorm → Plan → Work → Review → Compound

This solves "how does a requirement get AI-engineered end-to-end?" Every stage has explicit input artifacts, output artifacts, and a stage-gate contract.

Which bootstrap should I run?

Entrypoint	When to use	Produces	Stability
`/spec:graph-bootstrap` · `$spec-graph-bootstrap`	You want fact-extracted, graph-informed context (Phase 0–4)	Phase 0–4 facts + `injection-index.yaml` + `minimal-context/*.json`	Primary Stage-0 entry
`/spec:compound` · `$spec-compound`	You want broader knowledge capture and reusable context synthesis	Context synthesis docs and reusable knowledge artifacts	Complementary Stage-0 path

Stage-0 entrypoints run a Host Readiness Gate at startup. If MCP setup was skipped or the host was not restarted, they stop with explicit guidance rather than degrade silently. If you still see an older bootstrap entrypoint in local runtime assets or stale documentation, migrate to /spec:graph-bootstrap or /spec:compound.

Stage-0 Context Quality Signals

Every context artifact carries machine-readable quality metadata. Downstream SKILL.md contracts read these signals and adapt:

Field	Values	Meaning
`data_quality`	`fact-backed` · `partial` · `empty` · (absent = legacy manifest, treated as backward-compatible)	How much of the context comes from real code analysis
`provenance`	`fact-inventory` · `empty-fallback`	Whether content was compiled from extracted facts or from a skeletal default
`confidence`	`high` · `medium` · `low`	LLM-consumable trust signal
`fallback_reason`	`empty_fact_inventory` (root cause) · `minimal_context_missing` (secondary) · `workspace_child_partial_degraded` · (other runtime-specific values)	Explicit degradation cause when the context is not fact-backed

When data_quality: empty, the evaluator downgrades to L1 and sets fallback_reason. The LLM gets a clear signal: this context is skeletal, not a real analysis.

Evaluator levels

L0 — fact-backed context with real AST-derived signals; full-strength Stage-0 input.
L1 — skeletal or degraded context; the evaluator has set fallback_reason, and downstream SKILLs should treat these signals as advisory.
L2 — fixed minimal fallback; used only when injection-index.yaml cannot be resolved, falling back to ambient defaults (e.g. 00-summary.md, pitfalls/index.md).

Downstream skills are allowed to proceed at any level. The evaluator exposes the level so the LLM can adjust its own confidence, not to block execution.

Enforcement Model

Layer	Scope	Type
CLI (`doctor` / `init` / `clean` / `stage0-context`)	Asset sync, state tracking, manifest validation, Stage-0 context emission	Code-hard, enforced through shell exit code
Host Readiness Gate + Stage-0 evaluator L0/L1/L2	Enforced when `graph-bootstrap` / `stage0-context` runs, emitting `fallback_reason` and degraded level	Runtime signal, emitted by code, consumed by LLM
Workflow stages (`SKILL.md`)	Stage contracts, artifact naming, review classes, requirements trace	SKILL contract, followed by LLM
Context signals (`provenance` / `confidence` / `fallback_reason`)	In-artifact metadata	SKILL contract, consumed by LLM

Supported Languages

Language	Parser	Notes
C	`tree-sitter-c`
C++	`tree-sitter-cpp`
C#	`tree-sitter-c-sharp`
Go	`tree-sitter-go`
Java	`tree-sitter-java`
JavaScript	`tree-sitter-javascript`	CommonJS `require()` is resolved into `imports_from` edges
Kotlin	`tree-sitter-kotlin`
Objective-C	`tree-sitter-objc` (vendored fork)	`.m` / `.mm` / heuristic `.h` routing; extracts `@interface/@implementation/@protocol`
PHP	`tree-sitter-php`
Python	`tree-sitter-python`
Ruby	`tree-sitter-ruby`
Rust	`tree-sitter-rust`
Scala	`tree-sitter-scala`
Swift	`tree-sitter-swift` (vendored fork)	Removes the upstream `tree-sitter-cli` install-time dependency
TypeScript	`tree-sitter-typescript`	Covers `.ts` / `.tsx` / `.d.ts`

iOS repositories are auto-detected (Podfile.lock / .xcodeproj) and Pod exclude paths are applied automatically.

What You Get

Capability	What it solves
CLI control plane (`doctor` / `init` / `clean` / `stage0-context`)	Repeatable install, health checks, cleanup, and Stage-0 context emission — managed assets always stay traceable
CRG graph engine (`spec-first crg *`)	Code Review Graph — an embedded Node.js runtime over SQLite + FTS5, covering AST → symbols → resolved edges → PageRank flows → community detection → surprising-connections → god-nodes → review-context
graph-bootstrap context engine	LLM gets fact-extracted, confidence-annotated project context instead of a raw codebase
Full workflow layer	Ideate → Brainstorm → Plan → Work → Review → Compound, every stage with an explicit artifact contract
17-persona Review stage (+ 2 CE agents)	Produces structured findings routed by `safe_auto / gated_auto / manual / advisory`, not a single-pass scan
Compound / knowledge capture	Solved problems are written to `docs/solutions/` for future workflow retrieval
Dual platform support	One methodology across Claude Code (`/spec:`) and Codex (`$spec-`). Claude uses a `SessionStart` hook + bare-agent rewrite; Codex uses `.agents/skills/` discovery + explicit `.codex/agents/...` path rewrite
Capability layer	Bundled source assets ship with `47` skills, `57` agents, and `4` agent support files. Runtime delivery is host-filtered by governance: the current bundle installs `12` commands + `35` skills on Claude, and `34` skills on Codex, with `57` agents + `4` support files on both hosts
Runtime governance	Managed assets are tracked in `state.json` — sync, refresh, recover, and clean safely

Core Workflow

Spec-First overview

Primary stages

Stage	Claude Code	Codex	Output Artifact	Enforcement
Host Setup	`/spec:mcp-setup` → restart	`$spec-mcp-setup` → restart	Host-specific marker: `~/.claude/spec-first/host-setup.json` or `~/.codex/spec-first/host-setup.json`	Code-hard (bootstrap gate checks this)
Stage-0 graph bootstrap	`/spec:graph-bootstrap`	`$spec-graph-bootstrap`	Phase 0–4 facts + `injection-index.yaml` + `minimal-context/*.json`	Code-hard gate + SKILL.md content
Ideate	`/spec:ideate`	`$spec-ideate`	`docs/ideation/*.md`	SKILL.md contract
Brainstorm	`/spec:brainstorm`	`$spec-brainstorm`	`docs/brainstorms/*.md`	SKILL.md contract
Plan	`/spec:plan`	`$spec-plan`	`docs/plans/*.md`	SKILL.md contract
Work	`/spec:work`	`$spec-work`	code + tests	SKILL.md contract
Review	`/spec:review`	`$spec-review`	structured review report	SKILL.md contract (17 reviewer personas + 2 CE agents)
Compound	`/spec:compound`	`$spec-compound`	`docs/solutions/*/.md`	SKILL.md contract

Auxiliary stages

Stage	Claude Code	Codex	Purpose
Debug	`/spec:debug`	`$spec-debug`	Reproduce and diagnose an existing bug or failure
Update	`/spec:update`	`$spec-update`	Refresh runtime assets after `spec-first` upgrades
Sessions	`/spec:sessions`	`$spec-sessions`	Search and summarize prior coding agent sessions
Setup	`/spec:setup`	`$spec-setup`	Unified host / environment setup entrypoint

These /spec:* and $spec-* surfaces are generated runtime workflow entrypoints, not root spec-first subcommands. The root CLI surface is documented below under CLI Commands.

Quick Start

Prerequisites

Node.js >=20
Git repository — spec-first init reads git config user.name and graph-bootstrap depends on git ls-files, so non-Git directories are not supported
At least one of Claude Code or Codex
Disk: roughly 60–120 MB of node_modules (15 tree-sitter parsers plus the better-sqlite3 native build)

1. Install

npm install -g spec-first
spec-first -v

postinstall note: The installer runs bin/postinstall.js, which prints an install confirmation card and then trims native tree-sitter prebuilds for platforms other than yours. This step only deletes files inside the installed node_modules/ tree; it never touches your project files.

2. Check the environment

spec-first doctor
spec-first doctor --claude   # Claude-only scope
spec-first doctor --codex    # Codex-only scope

If doctor reports legacy managed state, run init again. This is the only supported upgrade path — it performs a managed hard reset before rebuilding the runtime. doctor --json also exposes workflow verification evidence as structured facts: schema validity, freshness, fallback_reason, and evidence_age_summary (oldest_* / newest_* + max_age_ms) so downstream workflows do not need to infer evidence staleness heuristically.

3. Initialize a project

spec-first init --claude
# or
spec-first init --codex

To set developer identity explicitly:

spec-first init --claude -u <name> --lang <zh|en>
spec-first init --codex -u <name> --lang <zh|en>

Identity resolution order:

-u flag value (when provided)
~/.spec-first/.developer (global identity)
git config user.name fallback

Language resolution order:

--lang flag value (when provided)
Existing project .developer profile
Default zh

What `init` writes

init is not a read-only operation. It mounts spec-first into your project by writing the following:

Target	What gets written	Removable by `clean`?
`CLAUDE.md` / `AGENTS.md`	`<!-- spec-first:lang:* -->` language policy block (idempotent marker block)	❌ Manual removal — `clean` does not strip the language policy block
`CLAUDE.md` / `AGENTS.md`	`using-spec-first` instruction bootstrap block	✅ Removed by `clean`
`CLAUDE.md` / `AGENTS.md`	`<!-- spec-first:coding-guidelines:* -->` coding execution guidelines block	✅ Removed by `clean`
`.claude/settings.json`	Managed `SessionStart` matcher entry (Claude only)	✅ Removed by `clean`
`.claude/hooks/session-start`	Managed `SessionStart` hook script (Claude only)	✅ Removed by `clean`
`.claude/commands/spec/` · `.claude/skills/` · `.claude/agents/**` (or Codex equivalents)	Managed runtime assets	✅ Removed by `clean`
`.claude/spec-first/.developer` / `.codex/spec-first/.developer`	Host-specific project developer profile	✅ Removed by `clean`
`.claude/spec-first/state.json` / `.codex/spec-first/state.json`	Host-specific managed asset tracking state	✅ Removed by `clean`
`CHANGELOG.md`	Bootstrapped only when missing, with the managed format header and an initial init entry	❌ User-owned after creation

How to roll back

spec-first clean --claude   # or --codex

init does not overwrite an existing CLAUDE.md / AGENTS.md. On first install, spec-first appends its managed instruction blocks as a footer after any existing user content; on re-init, it only replaces the marker-delimited managed blocks it owns.

clean removes everything marked removable in the table above, then prints which platform's managed assets were removed. Custom assets outside the managed set are left untouched. The language policy block must still be removed manually — search for <!-- spec-first:lang: in CLAUDE.md / AGENTS.md. Both init --dry-run and clean --dry-run preview file-level operations derived from the same managed operation plans used by real apply paths, which keeps preview/apply drift narrow and testable. Current runtime delivery is host-specific by governance: Claude writes 12 command files, 35 skill directories, 57 agent files, and 4 agent support files; Codex writes 34 skill directories plus the same 57 agent files and 4 support files, with no command directory.

Example output

$ spec-first init --claude

🪝 Installed Claude SessionStart matcher in .claude/settings.json
📦 Generated 12 command file(s) in .claude/commands/spec
🧩 Generated 35 skill directory(ies) in .claude/skills
🤖 Generated 57 agent file(s) in .claude/agents
🧰 Generated 4 agent support file(s) in .claude/agents
🪪 Wrote project developer profile:
  📍 path: .claude/spec-first/.developer
  👤 name: yourname
  🈯 lang: zh
  ⏱ initialized_at: <ISO-8601 timestamp>
  🔖 version: <installed spec-first version>

🔁 Restart Claude Code after generation so it can pick up the new /spec:* commands.

Counts and version reflect the version actually installed at run time. If CHANGELOG.md did not exist yet, init also prints 📝 Bootstrapped CHANGELOG.md. The install log is emitted in English regardless of --lang; the --lang setting governs future Claude / Codex response language, not the installer's own output. Codex output differs by design: it does not generate .claude/commands/spec, and it restarts into $spec-* skill entrypoints instead.

4. First run

Step	Claude Code	Codex
Install MCP tools	`/spec:mcp-setup`	`$spec-mcp-setup`
Restart host	restart Claude Code	restart Codex
Build context	`/spec:graph-bootstrap` or `/spec:compound`	`$spec-graph-bootstrap` or `$spec-compound`
Start the workflow	`/spec:ideate` → `/spec:brainstorm` → `/spec:plan` → `/spec:work` → `/spec:review` → `/spec:compound`	`$spec-ideate` → … → `$spec-compound`

graph-bootstrap runs a Host Readiness Gate at startup. If MCP setup was skipped or the host was not restarted, it stops with explicit guidance rather than degrade silently.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Entry Layer — spec-first CLI                                │
│  doctor / init / clean / stage0-context / crg <subcommand>   │
│  Enforcement: code-hard (asset sync, state, manifest,        │
│               Stage-0 emission, CRG SQLite pipeline)         │
├──────────────────────────────────────────────────────────────┤
│  Context Layer — graph-bootstrap / CRG module                │
│  AST fact extraction → artifact-manifest (data_quality)      │
│  → minimal-context (provenance + confidence + fallback)      │
│  → injection-index (stage-aware routing)                     │
│  Enforcement: code-hard gate (L0/L1/L2) + SKILL.md content   │
├──────────────────────────────────────────────────────────────┤
│  Workflow Layer — skills                                     │
│  Ideate / Brainstorm / Plan / Work / Review / Compound       │
│  + Debug / Update / Sessions / Setup auxiliaries             │
│  Stage contracts, artifact conventions, review classes       │
│  Enforcement: SKILL.md contracts (LLM-followed)              │
├──────────────────────────────────────────────────────────────┤
│  Capability Layer — agents (6 categories)                    │
│  review/ (17 reviewer personas + CE agents)                  │
│  document-review/ (requirements / plan persona review)       │
│  research/ (session / doc / Feishu / web context readers)    │
│  design/ (UI / design-lens agents)                           │
│  workflow/ (bug-reproduction / lint / pr-comment-resolver)   │
│  docs/ (documentation / onboarding support)                  │
│  Enforcement: convention (LLM-dispatched)                    │
└──────────────────────────────────────────────────────────────┘

Runtime assets under .claude/, .codex/, or .agents/ are generated outputs, not editable source. skills/, agents/, templates/, and docs/ are the source of truth.

CLI Commands

Managed-asset commands

Command	Purpose	Notes
`spec-first doctor`	Environment check	Verifies platform state, plugin manifest, and managed assets. `--claude` / `--codex` scopes to one platform. Reports `legacy managed state` when `init` is needed, and `--json` includes evidence schema/freshness plus `evidence_age_summary`.
`spec-first init`	Initialize the runtime	Syncs commands, skills, agents, runtime hooks, and developer metadata through managed operation plans. Also the only supported legacy upgrade entrypoint — performs a managed hard reset. See What `init` writes above.
`spec-first clean`	Remove managed assets	Removes the given platform's spec-first managed assets through the same operation-plan boundary used by `--dry-run`; does not migrate legacy state and does not strip the language policy marker block.
`spec-first stage0-context`	Emit Stage-0 runtime context	Called by SKILLs such as `spec-plan` / `spec-work` / `spec-review` at stage start. Accepts `--stage <plan\|work\|review>`, `--workflow <skill-name>`, `--format json`.

CRG graph commands (`spec-first crg <subcommand>`)

An embedded Code Review Graph runtime over SQLite + FTS5.

spec-first crg --help
spec-first crg build --repo .
spec-first crg review-context --repo . --changed <ref>

Subcommand	Purpose
`build`	Build or incrementally refresh the graph DB from a repo
`stats`	Report node / edge / community counts and unresolved edges
`context`	Export a context bundle for a symbol or file
`query`	Eight structured lookups: `callers_of / callees_of / importers_of / importees_of / dependents_of / dependencies_of / tests_for / similar_to`
`impact`	Impact-of-change analysis for a file or symbol
`large-functions`	Find functions above a size threshold
`search`	FTS5 full-text search across symbols / files
`flows`	PageRank + BFS flow detection
`flow` / `affected-flows`	Inspect a single flow, or flows affected by a diff
`communities` / `community`	3-pass community detection, plus single-community inspection
`architecture`	High-level architecture summary
`surprising-connections`	Cross-community / peripheral-to-hub surprise detector
`god-nodes`	High-fan-in hub detection
`detect-changes`	SHA-256 incremental change detection
`review-context`	Compose a review context bundle from a diff
`postprocess`	Recompute communities, flows, graph analysis, and FTS after a build or incremental refresh

All subcommands accept --repo=<path>. The full list is whatever spec-first crg --help prints for the installed version.

Documentation

Detailed manuals and implementation docs are currently Chinese-first. Until English translations catch up, English readers can use DeepWiki or Ask ChatGPT as a supplementary entrypoint.

Document	Language	Description
Chinese README	zh	Full Chinese README
User Manual	zh	Complete user manual index
Quick Start	zh	First-time setup walkthrough
Core Concepts	zh	Architecture and terminology
Full Example	zh	End-to-end delivery walkthrough
FAQ	zh	Troubleshooting and common issues
Best Practices	zh	Team usage patterns
Chinese Architecture Overview	zh	System design for contributors
Chinese Development Guide	zh	Contributor standards
Chinese Testing Plan	zh	Verification and test strategy
CHANGELOG	en / zh mixed	Canonical version history (machine-readable)
Chinese Release Notes	zh	Narrative release notes

Local Development

git clone https://github.com/sunrain520/spec-first.git
cd spec-first
npm install --legacy-peer-deps
npm test

--legacy-peer-deps is required because the vendored tree-sitter forks and jest's peer-dependency resolution conflict under stricter resolvers. Omitting it typically fails the first jest run.

Verification scripts

npm run test:unit           # shell unit tests + jest unit suite (tests/unit/*)
npm run test:smoke          # install-local + CLI smoke
npm run test:integration    # verification-gate jest + e2e shell
npm run test:e2e:crg        # CRG full-command + SQLite audit
npm run test:jest           # jest only
npm run test:crg:gate       # CRG regression gate (benchmarks/regression/*)
npm run test:ai-dev:gate    # AI Dev Quality Gate (light contract check)
npm pack                    # release tarball dry run

npm test itself runs test:unit → test:smoke → test:integration → test:e2e:crg in that order.

Contributing

Issues and pull requests are welcome.

To report a bug, open an Issue with reproduction steps, environment details, and expected behavior.

To contribute code:

Fork the repository and create a feature branch from master.
Treat master as the only branch that accepts direct updates; main is an automatically synced mirror branch and should not receive direct development or commits.
Read AGENTS.md for repository workflow conventions.
Run npm install --legacy-peer-deps, then npm test.
Open a PR with the change goal and verification details.
Every code / doc change must add a line to CHANGELOG.md following the format defined at the top of that file.

Recommended reading before contributing: AGENTS.md · User Manual · CHANGELOG

License

Keywords

spec-first

ai-coding-workflow

spec-driven-engineering

spec-driven-development

harness-engineering

knowledge-compound

FAQs

What is spec-first?

Is spec-first popular?

Is spec-first well maintained?

Package last updated on 21 Apr 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

spec-first

Spec-First

Why Spec-First

Design Philosophy

How It Works

Two Complementary Parts

Which bootstrap should I run?

Stage-0 Context Quality Signals

Evaluator levels

Enforcement Model

Supported Languages

What You Get

Core Workflow

Primary stages

Auxiliary stages

Quick Start

Prerequisites

1. Install

2. Check the environment

3. Initialize a project

What init writes

How to roll back

Example output

4. First run

Architecture

CLI Commands

Managed-asset commands

CRG graph commands (spec-first crg <subcommand>)

Documentation

Local Development

Verification scripts

Contributing

License

Keywords

Related posts

152 Chrome Live Wallpaper Extensions Hid Ad Tracking and Faked Google Search Traffic

Andrew Becherer Joins Socket as Chief Information Security Officer

What `init` writes

CRG graph commands (`spec-first crg <subcommand>`)