
Security News
RubyGems Adds Cooldown Feature to Bundler for Newly Published Gems
RubyGems and Bundler 4.0.13 introduced an opt-in cooldown feature that delays newly published gems during dependency resolution.
17 AI agents for Claude Code — full development lifecycle with adversarial challengers at plan and spec boundaries
15 AI agents for Claude Code — full development lifecycle from product thinking to production monitoring.
npx buildcrew
AI coding agents are powerful, but without structure they produce inconsistent results. buildcrew gives Claude Code:
@buildcrew, it routes automaticallyYou: @buildcrew Add user authentication
Crew: Planner → Designer → Developer → QA → Browser QA → Reviewer → Ship
No external dependencies. No runtime. No binaries. Just Markdown.
One command does everything:
npx buildcrew
The interactive setup will:
Then start working:
@buildcrew Add user dashboard
| Agent | Model | Role |
|---|---|---|
| planner | opus | 6 Forcing Questions + 4-Lens Self-Review (CEO, Engineering, Design, QA). Plans scored 1-10 per lens. |
| designer | opus | UI/UX research + motion engineering. Playwright screenshots, Figma MCP, production components with animations. AI slop blacklist. |
| developer | opus | 6 Implementation Questions + 3-Lens Self-Review (Architecture, Code Quality, Safety). Error Handling Protocol. 3 modes: feature, bugfix, iteration. |
Runs between pipeline stages to catch errors before downstream agents commit. Produces structured critique with APPROVED / REVISE / REJECT verdict and a revise loop.
| Agent | Model | Role |
|---|---|---|
| plan-challenger | opus | Attacks 01-plan.md across 6 vectors (premise, scope, alternatives, risks, acceptance criteria, metrics). Runs AFTER planner, BEFORE designer. Writes 01.5-plan-critique.md. |
| spec-challenger | opus | Attacks 02-design.md document (not rendered UI) across 8 vectors (plan alignment, state coverage, edge cases, data flow, failure modes, accessibility, motion spec, developer contract). Runs AFTER designer, BEFORE developer. Writes 02.5-spec-critique.md. |
| Agent | Model | Role |
|---|---|---|
| qa-tester | sonnet | 5 Test Strategy Questions + Test Map methodology. Edge case generation, confidence-scored findings. |
| browser-qa | sonnet | 4-phase browser testing (orient, explore, stress, judge) via Playwright MCP. Health score 0-100, self-review. |
| reviewer | opus | 4-specialist analysis (security, perf, testing, maintainability) + confidence scoring + adversarial pass + auto-fix. Runs AFTER code. |
| health-checker | sonnet | 3-phase code quality (detect, measure, prescribe). Weighted 0-10 score + trends + top 5 actionable items. |
| Agent | Model | Role |
|---|---|---|
| security-auditor | sonnet | OWASP Top 10 + STRIDE threat model. 10-phase audit with confidence gate. |
| canary-monitor | sonnet | 3-phase post-deploy health (orient, verify, judge). Baseline comparison, confidence-scored findings. |
| shipper | sonnet | 8-point pre-flight + semver + changelog + PR + post-ship verification. |
| Agent | Model | Role |
|---|---|---|
| thinker | opus | "Should we build this?" — 6 forcing questions, premise challenge, 3 alternatives, cross-model outside perspective, design doc output. |
| architect | opus | Architecture review BEFORE code — scope challenge, component diagrams, data flow, failure modes, test coverage map. |
| design-reviewer | sonnet | UI/UX quality — 8 dimensions scored 0-10, screenshot evidence via Playwright, specific fixes with effort estimates, WCAG compliance. |
| Agent | Model | Role |
|---|---|---|
| investigator | sonnet | 4-phase root cause debugging. 12 common bug patterns. Edit freeze on unrelated code. |
| qa-auditor | opus | 3 parallel subagents (security, bugs, spec compliance) audit git diffs against design docs. No API key needed. |
Talk to @buildcrew naturally. It auto-detects the mode.
| Mode | Example | Pipeline |
|---|---|---|
| Feature | "Add user dashboard" | Plan → Plan-Challenger → Design → Spec-Challenger → Dev → QA → Browser QA → Review → Coherence |
| Project Audit | "full project audit" | Scan → Prioritize → Fix → Verify (loop) |
| Browser QA | "browser qa localhost:3000" | Playwright testing + health score |
| Security | "security audit" | OWASP + STRIDE + secrets + deps |
| Debug | "debug: login broken" | 4-phase root cause investigation |
| Health | "health check" | Quality dashboard (types, lint, deps, bundle) |
| Canary | "canary https://myapp.com" | Post-deploy production monitoring |
| Review | "code review" | Multi-specialist analysis + auto-fix |
| Ship | "ship" | Test → version → changelog → PR |
| QA Audit | "qa" | 3 parallel subagent audit on git diff |
| Think | "is this worth building?" | 6 forcing questions + alternatives + design doc |
| Arch Review | "architecture review" | Scope challenge + diagrams + failure modes |
| Design Review | "design review" | 8-dimension scoring + specific fixes |
When a message matches multiple modes, a priority table resolves conflicts. Debug always wins. Think beats Feature. "architecture review" goes to Architect, not Reviewer. If truly ambiguous, asks the user.
After any mode completes, buildcrew offers an independent second opinion:
The user decides what to act on.
Each iteration runs the full end-to-end pipeline:
@buildcrew Add user dashboard, 5 iterations
Between the existing pipeline stages, two challenger agents attack the upstream artifact before downstream agents commit. A wrong plan poisons everything downstream — plan-challenger catches plan errors while they're still cheap. A thin spec forces developers to invent critical details — spec-challenger catches spec gaps before developer writes a line.
planner → plan-challenger ─┬─ APPROVED → designer
├─ REVISE → planner re-runs (max 2 cycles)
└─ REJECT → escalate to user
designer → spec-challenger ─┬─ APPROVED → developer
├─ REVISE → designer re-runs (max 2 cycles)
└─ REJECT → escalate to user
plan-challenger (6 vectors): Premise (demand evidence, specific user, opportunity cost) · Scope (cut-50% test, hidden creep) · Alternatives (≥2 compared + build-vs-buy + do-nothing) · Risks (load-bearing assumptions, failure modes, reversibility) · Acceptance Criteria (binary pass/fail, observable, negative cases) · Metrics (measurable, causal, baseline, timeframe).
spec-challenger (8 vectors): Plan Alignment (matrix of every plan criterion → spec coverage) · State Coverage (matrix of every component × required states) · Edge Cases (tiny/huge screens, slow network, concurrent edits, long text, RTL, reduced motion) · Data Flow (input source, optimistic vs pessimistic, cache) · Failure Modes (network/auth/permission/race) · Accessibility (keyboard, focus, screen reader, contrast, live regions, touch targets) · Motion Spec (per-component map, named durations/easings, reduced-motion fallback) · Developer Contract (props, handlers, side effects, file structure, testing hooks).
reviewer (post-dev code review), design-reviewer (post-dev rendered UI review), qa-auditor (post-dev diff audit), and coherence-auditor (final handoff consistency) all run AFTER developer. Challengers are structurally different: pre-dev, on documents, with revise loops. Merging would destroy the asymmetry that makes each role sharp.
How do you know the 17 agents actually worked as a team, instead of running in sequence and pretending to collaborate?
buildcrew answers this with Coordination Score — a 0-100% measurement output at the end of every Feature run.
Every agent ends its output with a ## Handoff Record section declaring three things:
Inputs consumed — what files/sections it actually readOutputs for next agents — what it produced and who should consume itDecisions NOT covered by inputs — autonomous judgment calls with reasonsA meta-agent coherence-auditor runs LAST and:
coherence-report.mdThe crew report shows the score:
📊 buildcrew Report
─────────────────────────────
✅ Agents: planner, plan-challenger, designer, spec-challenger,
developer, qa-tester, reviewer, coherence-auditor
🔄 Outer iterations: 2/3
🎯 Challenger verdicts:
plan-challenger : APPROVED (0 blocking, 2 nits) after 1 revise cycle
spec-challenger : APPROVED (0 blocking, 3 nits) on first pass
🎯 Coordination Score: 82% — Normal (9/11 edges, 0 fabrications, 2 gaps)
📁 Output: .claude/pipeline/{feature-name}/
├── 01-plan.md ├── 02-design.md
├── 01.5-plan-critique.md ├── 02.5-spec-critique.md
└── coherence-report.md
─────────────────────────────
| Score | Status | What it means |
|---|---|---|
| 90-100 | Healthy | Real team collaboration |
| 70-89 | Normal | Minor gaps, ship-ready |
| 50-69 | Suspicious | Coordination has holes — review the design |
| 0-49 | Theater | ⚠️ This is not a team — it's 17 independent scripts |
This makes "team collaboration" a measurable property, not a marketing claim. Full spec: docs/02-design/coordination-verifiability.md. Policy: docs/ADR-001-deps.md.
npx buildcrew auto-detects your stack and generates a project harness.
| Category | Detected from |
|---|---|
| Framework | package.json (Next.js, Nuxt, React, Vue, SvelteKit, Express) |
| Language | TypeScript, TailwindCSS, Framer Motion |
| Database | Supabase, Prisma, Drizzle, MongoDB |
| Auth | NextAuth, Supabase Auth, Firebase Auth |
| Payments | Stripe, Paddle, Toss Payments |
| AI | OpenAI, Anthropic, Google AI |
| Deploy | Vercel, Netlify, Fly.io, Docker |
| Components | Scans src/components/ |
| API Routes | Scans src/app/api/ |
| Locales | Scans i18n directories |
.claude/harness/
├── project.md ← always (project context, stack, components, API routes)
├── rules.md ← always (smart defaults for your framework)
├── erd.md ← if database detected
├── api-spec.md ← if API routes found
├── design-system.md ← if TailwindCSS detected
├── architecture.md ← always
└── user-flow.md ← if i18n or 5+ components
Add any .md file to .claude/harness/ — agents read them all.
npx buildcrew harness # Check which files need editing
npx buildcrew add # List available templates
Real-time observability for buildcrew sessions. A pixel-art office visualization where your 17 agents come alive — walking between rooms, filing issues, and progressing through the pipeline — all powered by Claude Code hooks and zero external dependencies.
# 1. Install hooks into your project
npx buildcrew-dashboard --install
# 2. Start the dashboard server (opens browser automatically)
npx buildcrew-dashboard
Then open any Claude Code session with @buildcrew in the same directory. Events stream to the dashboard in real time.
| Panel | Description |
|---|---|
| Pixel Town | 5 rooms (Meeting, QA Lab, SecOps, Think Tank, Field) with 16 animated agent sprites |
| Stage Ladder | Pipeline progress: PLAN → DESIGN → DEV → QA → REVIEW → SHIP |
| Billboard | Current stage, notification badge, issue ticker |
| Log Panel | 3 tabs — Events (filterable log), Dialogue (agent conversation view), Terminal (command output) |
The Terminal tab includes a command bar that spawns claude -p on the server. Three permission modes:
| Mode | Flag | Use When |
|---|---|---|
| Strict | default | Production work — every tool call needs approval |
| Normal | acceptEdits | Day-to-day — file edits auto-approved |
| Trust | bypassPermissions | Demos and solo work — everything auto-approved |
--install adds four Claude Code hooks to .claude/settings.json:
Hooks are tagged buildcrew-dashboard for safe removal via --uninstall. They timeout at 500ms and never block Claude Code.
The dashboard tracks multiple concurrent Claude Code sessions in the same project. Each session gets a unique color chip. Filter by session to see isolated event streams.
| Flag | Description |
|---|---|
--install | Install Claude Code hooks (project-local) |
--install --global | Install hooks globally |
--install --with-permissions | Also auto-allow buildcrew tool calls |
--install --dry-run | Preview changes without writing |
--uninstall | Remove hooks |
--uninstall --global | Remove global hooks |
--port N | Custom port (default: 3737) |
--no-open | Start server without opening browser |
# Terminal 1: start the dashboard
npx buildcrew-dashboard
# Terminal 2: run the demo script
node node_modules/buildcrew/bin/dashboard-demo.js
The demo simulates a full Feature pipeline with realistic Korean dialogue between agents.
Each feature generates a full document chain:
.claude/pipeline/{feature}/
├── 01-plan.md Requirements + 4-lens review scores
├── 01.5-plan-critique.md plan-challenger verdict + 6-vector findings
├── 02-design.md Design decisions + component specs
├── 02.5-spec-critique.md spec-challenger verdict + 8-vector findings + matrices
├── 03-dev-notes.md Implementation + 6-question analysis + self-review
├── 04-qa-report.md Test map + acceptance criteria verification
├── 05-browser-qa.md Health score + screenshots + flows
├── 06-review.md 4-specialist findings + auto-fixes
├── 07-ship.md PR URL + release notes
└── coherence-report.md Coordination Score + gaps + fabrications + orphans
| Command | Description |
|---|---|
npx buildcrew | Full interactive setup (agents + MCP + harness) |
npx buildcrew init | Generate harness only |
npx buildcrew init --force | Regenerate harness (backs up existing) |
npx buildcrew add | List harness templates |
npx buildcrew add <name> | Add a template |
npx buildcrew harness | Show harness file status |
npx buildcrew --force | Overwrite existing agents |
npx buildcrew --list | List agents with models |
npx buildcrew --uninstall | Remove agents |
npx buildcrew --version | Show version |
@buildcrew (orchestrator, opus, 199 lines)
│
├─ reads .claude/harness/*.md
├─ detects mode from user message (13 modes, priority table)
├─ dispatches agents with harness context
├─ enforces quality gates + iteration
└─ offers second opinion after completion
│
├── Think: thinker → architect
├── Build: planner → designer → developer
├── Adversarial: plan-challenger, spec-challenger (phase-boundary critics)
├── Quality: qa-tester → browser-qa → reviewer
├── Sec/Ops: security-auditor, canary-monitor, shipper
├── Review: architect, design-reviewer, qa-auditor
├── Meta: coherence-auditor (final handoff audit)
└── Debug: investigator
Agents include version headers. When you run npx buildcrew on an existing project, outdated agents are automatically updated — no --force needed.
MIT
FAQs
17 AI agents for Claude Code — full development lifecycle with adversarial challengers at plan and spec boundaries
We found that buildcrew demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
RubyGems and Bundler 4.0.13 introduced an opt-in cooldown feature that delays newly published gems during dependency resolution.

Security News
pnpm 11.5 now recognizes npm staged publish approvals in release metadata, preventing those releases from being mistaken for lower-trust package publishes.

Security News
Federal audit finds NIST lacked a plan to clear the NVD backlog, wasted funds on duplicate work, and delayed use of CISA data.