Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

buildcrew

Package Overview
Dependencies
Maintainers
1
Versions
26
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

buildcrew

17 AI agents for Claude Code — full development lifecycle with adversarial challengers at plan and spec boundaries

latest
Source
npmnpm
Version
1.10.0
Version published
Maintainers
1
Created
Source

buildcrew

English | 한국어 | Docs

15 AI agents for Claude Code — full development lifecycle from product thinking to production monitoring.

npx buildcrew

Why buildcrew?

AI coding agents are powerful, but without structure they produce inconsistent results. buildcrew gives Claude Code:

  • A team — 15 specialized agents (7 opus + 8 sonnet), each with a clear role
  • A process — sequential pipeline with quality gates and iteration
  • A harness — your project context auto-detected from your codebase
  • An orchestrator — just talk to @buildcrew, it routes automatically
  • A second opinion — independent review after every mode (Codex or Claude subagent)
You:   @buildcrew Add user authentication
Crew:  Planner → Designer → Developer → QA → Browser QA → Reviewer → Ship

No external dependencies. No runtime. No binaries. Just Markdown.

Getting Started

One command does everything:

npx buildcrew

The interactive setup will:

  • Install 17 agents + orchestrator
  • Ask to install Playwright MCP (required for browser testing)
  • Ask to generate project harness (auto-detects your stack)
  • Let you pick additional harness templates

Then start working:

@buildcrew Add user dashboard

Agents

Build Team

AgentModelRole
planneropus6 Forcing Questions + 4-Lens Self-Review (CEO, Engineering, Design, QA). Plans scored 1-10 per lens.
designeropusUI/UX research + motion engineering. Playwright screenshots, Figma MCP, production components with animations. AI slop blacklist.
developeropus6 Implementation Questions + 3-Lens Self-Review (Architecture, Code Quality, Safety). Error Handling Protocol. 3 modes: feature, bugfix, iteration.

Adversarial Team

Runs between pipeline stages to catch errors before downstream agents commit. Produces structured critique with APPROVED / REVISE / REJECT verdict and a revise loop.

AgentModelRole
plan-challengeropusAttacks 01-plan.md across 6 vectors (premise, scope, alternatives, risks, acceptance criteria, metrics). Runs AFTER planner, BEFORE designer. Writes 01.5-plan-critique.md.
spec-challengeropusAttacks 02-design.md document (not rendered UI) across 8 vectors (plan alignment, state coverage, edge cases, data flow, failure modes, accessibility, motion spec, developer contract). Runs AFTER designer, BEFORE developer. Writes 02.5-spec-critique.md.

Quality Team

AgentModelRole
qa-testersonnet5 Test Strategy Questions + Test Map methodology. Edge case generation, confidence-scored findings.
browser-qasonnet4-phase browser testing (orient, explore, stress, judge) via Playwright MCP. Health score 0-100, self-review.
revieweropus4-specialist analysis (security, perf, testing, maintainability) + confidence scoring + adversarial pass + auto-fix. Runs AFTER code.
health-checkersonnet3-phase code quality (detect, measure, prescribe). Weighted 0-10 score + trends + top 5 actionable items.

Security & Ops

AgentModelRole
security-auditorsonnetOWASP Top 10 + STRIDE threat model. 10-phase audit with confidence gate.
canary-monitorsonnet3-phase post-deploy health (orient, verify, judge). Baseline comparison, confidence-scored findings.
shippersonnet8-point pre-flight + semver + changelog + PR + post-ship verification.

Thinking & Review Team

AgentModelRole
thinkeropus"Should we build this?" — 6 forcing questions, premise challenge, 3 alternatives, cross-model outside perspective, design doc output.
architectopusArchitecture review BEFORE code — scope challenge, component diagrams, data flow, failure modes, test coverage map.
design-reviewersonnetUI/UX quality — 8 dimensions scored 0-10, screenshot evidence via Playwright, specific fixes with effort estimates, WCAG compliance.

Specialist

AgentModelRole
investigatorsonnet4-phase root cause debugging. 12 common bug patterns. Edit freeze on unrelated code.
qa-auditoropus3 parallel subagents (security, bugs, spec compliance) audit git diffs against design docs. No API key needed.

13 Operating Modes

Talk to @buildcrew naturally. It auto-detects the mode.

ModeExamplePipeline
Feature"Add user dashboard"Plan → Plan-Challenger → Design → Spec-Challenger → Dev → QA → Browser QA → Review → Coherence
Project Audit"full project audit"Scan → Prioritize → Fix → Verify (loop)
Browser QA"browser qa localhost:3000"Playwright testing + health score
Security"security audit"OWASP + STRIDE + secrets + deps
Debug"debug: login broken"4-phase root cause investigation
Health"health check"Quality dashboard (types, lint, deps, bundle)
Canary"canary https://myapp.com"Post-deploy production monitoring
Review"code review"Multi-specialist analysis + auto-fix
Ship"ship"Test → version → changelog → PR
QA Audit"qa"3 parallel subagent audit on git diff
Think"is this worth building?"6 forcing questions + alternatives + design doc
Arch Review"architecture review"Scope challenge + diagrams + failure modes
Design Review"design review"8-dimension scoring + specific fixes

Mode Priority

When a message matches multiple modes, a priority table resolves conflicts. Debug always wins. Think beats Feature. "architecture review" goes to Architect, not Reviewer. If truly ambiguous, asks the user.

Second Opinion

After any mode completes, buildcrew offers an independent second opinion:

  • Codex CLI available: genuinely different AI model reviews the work
  • No Codex: fresh Claude subagent with no session memory

The user decides what to act on.

Iterations

Each iteration runs the full end-to-end pipeline:

@buildcrew Add user dashboard, 5 iterations

Adversarial Challengers

Between the existing pipeline stages, two challenger agents attack the upstream artifact before downstream agents commit. A wrong plan poisons everything downstream — plan-challenger catches plan errors while they're still cheap. A thin spec forces developers to invent critical details — spec-challenger catches spec gaps before developer writes a line.

The revise loop

planner → plan-challenger ─┬─ APPROVED → designer
                           ├─ REVISE  → planner re-runs (max 2 cycles)
                           └─ REJECT  → escalate to user

designer → spec-challenger ─┬─ APPROVED → developer
                            ├─ REVISE  → designer re-runs (max 2 cycles)
                            └─ REJECT  → escalate to user
  • APPROVED: 0 blocking findings. Proceed.
  • REVISE: ≥1 blocking finding but premise is intact. Upstream agent re-runs with critique file as an input, must address every blocking item (nits optional). Max 2 revise cycles; 3rd deadlock escalates to user.
  • REJECT: premise-level crack (≥3 blocking in Vector 1). Pipeline halts immediately and presents the critique — no auto-fix, human direction needed.

Attack vectors

plan-challenger (6 vectors): Premise (demand evidence, specific user, opportunity cost) · Scope (cut-50% test, hidden creep) · Alternatives (≥2 compared + build-vs-buy + do-nothing) · Risks (load-bearing assumptions, failure modes, reversibility) · Acceptance Criteria (binary pass/fail, observable, negative cases) · Metrics (measurable, causal, baseline, timeframe).

spec-challenger (8 vectors): Plan Alignment (matrix of every plan criterion → spec coverage) · State Coverage (matrix of every component × required states) · Edge Cases (tiny/huge screens, slow network, concurrent edits, long text, RTL, reduced motion) · Data Flow (input source, optimistic vs pessimistic, cache) · Failure Modes (network/auth/permission/race) · Accessibility (keyboard, focus, screen reader, contrast, live regions, touch targets) · Motion Spec (per-component map, named durations/easings, reduced-motion fallback) · Developer Contract (props, handlers, side effects, file structure, testing hooks).

Why not merge with existing reviewers

reviewer (post-dev code review), design-reviewer (post-dev rendered UI review), qa-auditor (post-dev diff audit), and coherence-auditor (final handoff consistency) all run AFTER developer. Challengers are structurally different: pre-dev, on documents, with revise loops. Merging would destroy the asymmetry that makes each role sharp.

Verifiable Coordination

How do you know the 17 agents actually worked as a team, instead of running in sequence and pretending to collaborate?

buildcrew answers this with Coordination Score — a 0-100% measurement output at the end of every Feature run.

How it works

  • Every agent ends its output with a ## Handoff Record section declaring three things:

    • Inputs consumed — what files/sections it actually read
    • Outputs for next agents — what it produced and who should consume it
    • Decisions NOT covered by inputs — autonomous judgment calls with reasons
  • A meta-agent coherence-auditor runs LAST and:

    • Parses every Handoff Record
    • Cross-checks: did agent B actually cite agent A's outputs?
    • Reads cited source files to verify the implementation matches the cited requirement (CONFIRMED / PARTIAL / MISSING_IN_CODE)
    • Computes Coordination Score and writes coherence-report.md
  • The crew report shows the score:

📊 buildcrew Report
─────────────────────────────
✅ Agents: planner, plan-challenger, designer, spec-challenger,
          developer, qa-tester, reviewer, coherence-auditor
🔄 Outer iterations: 2/3
🎯 Challenger verdicts:
   plan-challenger : APPROVED (0 blocking, 2 nits) after 1 revise cycle
   spec-challenger : APPROVED (0 blocking, 3 nits) on first pass
🎯 Coordination Score: 82% — Normal (9/11 edges, 0 fabrications, 2 gaps)
📁 Output: .claude/pipeline/{feature-name}/
   ├── 01-plan.md             ├── 02-design.md
   ├── 01.5-plan-critique.md  ├── 02.5-spec-critique.md
   └── coherence-report.md
─────────────────────────────

Score thresholds

ScoreStatusWhat it means
90-100HealthyReal team collaboration
70-89NormalMinor gaps, ship-ready
50-69SuspiciousCoordination has holes — review the design
0-49Theater⚠️ This is not a team — it's 17 independent scripts

What gets caught

  • Gaps: agent A declared output X for agent B, but B never cited it
  • Fabrications: agent B cited "plan section #4" that doesn't exist, or claimed to implement X but the code shows no evidence
  • Orphans: an agent whose work nothing downstream cited (the team ignored its output)

This makes "team collaboration" a measurable property, not a marketing claim. Full spec: docs/02-design/coordination-verifiability.md. Policy: docs/ADR-001-deps.md.

Harness Engineering

npx buildcrew auto-detects your stack and generates a project harness.

What it auto-detects

CategoryDetected from
Frameworkpackage.json (Next.js, Nuxt, React, Vue, SvelteKit, Express)
LanguageTypeScript, TailwindCSS, Framer Motion
DatabaseSupabase, Prisma, Drizzle, MongoDB
AuthNextAuth, Supabase Auth, Firebase Auth
PaymentsStripe, Paddle, Toss Payments
AIOpenAI, Anthropic, Google AI
DeployVercel, Netlify, Fly.io, Docker
ComponentsScans src/components/
API RoutesScans src/app/api/
LocalesScans i18n directories

Generated files

.claude/harness/
├── project.md        ← always (project context, stack, components, API routes)
├── rules.md          ← always (smart defaults for your framework)
├── erd.md            ← if database detected
├── api-spec.md       ← if API routes found
├── design-system.md  ← if TailwindCSS detected
├── architecture.md   ← always
└── user-flow.md      ← if i18n or 5+ components

The harness is open

Add any .md file to .claude/harness/ — agents read them all.

npx buildcrew harness     # Check which files need editing
npx buildcrew add         # List available templates

Dashboard

Real-time observability for buildcrew sessions. A pixel-art office visualization where your 17 agents come alive — walking between rooms, filing issues, and progressing through the pipeline — all powered by Claude Code hooks and zero external dependencies.

Quick Start

# 1. Install hooks into your project
npx buildcrew-dashboard --install

# 2. Start the dashboard server (opens browser automatically)
npx buildcrew-dashboard

Then open any Claude Code session with @buildcrew in the same directory. Events stream to the dashboard in real time.

What You See

PanelDescription
Pixel Town5 rooms (Meeting, QA Lab, SecOps, Think Tank, Field) with 16 animated agent sprites
Stage LadderPipeline progress: PLAN → DESIGN → DEV → QA → REVIEW → SHIP
BillboardCurrent stage, notification badge, issue ticker
Log Panel3 tabs — Events (filterable log), Dialogue (agent conversation view), Terminal (command output)

Command Bar

The Terminal tab includes a command bar that spawns claude -p on the server. Three permission modes:

ModeFlagUse When
StrictdefaultProduction work — every tool call needs approval
NormalacceptEditsDay-to-day — file edits auto-approved
TrustbypassPermissionsDemos and solo work — everything auto-approved

Hooks

--install adds four Claude Code hooks to .claude/settings.json:

  • PreToolUse (Agent) — captures agent dispatch
  • PostToolUse (Agent, Write/Edit) — captures agent completion and file writes
  • UserPromptSubmit — captures session start
  • Stop — captures session end

Hooks are tagged buildcrew-dashboard for safe removal via --uninstall. They timeout at 500ms and never block Claude Code.

Multi-Session

The dashboard tracks multiple concurrent Claude Code sessions in the same project. Each session gets a unique color chip. Filter by session to see isolated event streams.

CLI Options

FlagDescription
--installInstall Claude Code hooks (project-local)
--install --globalInstall hooks globally
--install --with-permissionsAlso auto-allow buildcrew tool calls
--install --dry-runPreview changes without writing
--uninstallRemove hooks
--uninstall --globalRemove global hooks
--port NCustom port (default: 3737)
--no-openStart server without opening browser

Demo Mode

# Terminal 1: start the dashboard
npx buildcrew-dashboard

# Terminal 2: run the demo script
node node_modules/buildcrew/bin/dashboard-demo.js

The demo simulates a full Feature pipeline with realistic Korean dialogue between agents.

Feature Pipeline

Each feature generates a full document chain:

.claude/pipeline/{feature}/
├── 01-plan.md              Requirements + 4-lens review scores
├── 01.5-plan-critique.md   plan-challenger verdict + 6-vector findings
├── 02-design.md            Design decisions + component specs
├── 02.5-spec-critique.md   spec-challenger verdict + 8-vector findings + matrices
├── 03-dev-notes.md         Implementation + 6-question analysis + self-review
├── 04-qa-report.md         Test map + acceptance criteria verification
├── 05-browser-qa.md        Health score + screenshots + flows
├── 06-review.md            4-specialist findings + auto-fixes
├── 07-ship.md              PR URL + release notes
└── coherence-report.md     Coordination Score + gaps + fabrications + orphans

CLI Reference

CommandDescription
npx buildcrewFull interactive setup (agents + MCP + harness)
npx buildcrew initGenerate harness only
npx buildcrew init --forceRegenerate harness (backs up existing)
npx buildcrew addList harness templates
npx buildcrew add <name>Add a template
npx buildcrew harnessShow harness file status
npx buildcrew --forceOverwrite existing agents
npx buildcrew --listList agents with models
npx buildcrew --uninstallRemove agents
npx buildcrew --versionShow version

Requirements

Architecture

@buildcrew (orchestrator, opus, 199 lines)
    │
    ├─ reads .claude/harness/*.md
    ├─ detects mode from user message (13 modes, priority table)
    ├─ dispatches agents with harness context
    ├─ enforces quality gates + iteration
    └─ offers second opinion after completion
         │
         ├── Think:        thinker → architect
         ├── Build:        planner → designer → developer
         ├── Adversarial:  plan-challenger, spec-challenger  (phase-boundary critics)
         ├── Quality:      qa-tester → browser-qa → reviewer
         ├── Sec/Ops:      security-auditor, canary-monitor, shipper
         ├── Review:       architect, design-reviewer, qa-auditor
         ├── Meta:         coherence-auditor  (final handoff audit)
         └── Debug:        investigator

Version Auto-Update

Agents include version headers. When you run npx buildcrew on an existing project, outdated agents are automatically updated — no --force needed.

License

MIT

Keywords

claude

FAQs

Package last updated on 15 Apr 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts