@codexstar/bug-hunter
Advanced tools
| interface: | ||
| display_name: "Bug Hunter" | ||
| short_description: "Find, verify, and auto-fix real code bugs" | ||
| default_prompt: "Use $bug-hunter to scan this codebase for confirmed runtime, logic, and security bugs." |
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
| # Canonical Structured Outputs For Bug Hunter | ||
| This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds. | ||
| This repository does not contain a checked-in `PLANS.md`, but this document is written to the same standard as the machine-local ExecPlan reference at `/Users/codex/Downloads/Code Files/PLANS.md`. Keep this plan self-contained as implementation proceeds. | ||
| ## Purpose / Big Picture | ||
| After this change, Bug Hunter will use one canonical structured contract from end to end. Each phase will emit validated JSON as the source of truth, while Markdown becomes a rendered report for humans. This matters because the current system mixes Markdown prompts, ad hoc parsing, and JSON side channels, which makes the pipeline slower, harder to validate, and more likely to drift into false positives, silent false negatives, or broken fix eligibility. | ||
| The user-visible result is simple to verify. A bug-hunter run should create phase artifacts such as `.bug-hunter/recon.json`, `.bug-hunter/findings.json`, `.bug-hunter/skeptic.json`, `.bug-hunter/referee.json`, `.bug-hunter/coverage.json`, and `.bug-hunter/fix-report.json`. The same run should still produce readable Markdown reports, but those Markdown files must be generated from the JSON artifacts rather than being the only source of truth. A failed or malformed phase output should be rejected immediately with a precise validation error and a retry path instead of slipping through as an empty or partially parsed report. | ||
| ## Progress | ||
| - [x] (2026-03-11 18:40Z) Create versioned JSON schemas for `recon`, `findings`, `skeptic`, `referee`, `coverage`, `fix-report`, plus shared definitions under `schemas/`. | ||
| - [x] (2026-03-11 18:40Z) Add `scripts/schema-runtime.cjs` and `scripts/schema-validate.cjs`, ship `schemas/` in the npm package, and add example valid/invalid `findings.json` fixtures. | ||
| - [x] (2026-03-11 18:40Z) Wire strict findings validation into `payload-guard.cjs`, `bug-hunter-state.cjs`, and `run-bug-hunter.cjs`, including retry-on-invalid-findings inside the chunk worker loop. | ||
| - [x] (2026-03-11 20:05Z) Replace Markdown-only phase prompting with JSON-first prompting plus rendered Markdown output guidance, including `scripts/render-report.cjs`. | ||
| - [x] (2026-03-11 20:05Z) Normalize confidence to numeric values in canonical findings/referee contracts and fix-plan eligibility. | ||
| - [x] (2026-03-11 20:05Z) Replace `coverage.md` as canonical loop state with `coverage.json` and keep `coverage.md` as a derived summary. | ||
| - [x] (2026-03-10 21:06Z) Add strict inbound and outbound validation, retry logic, and eval coverage for malformed outputs and stale contracts. | ||
| - [x] (2026-03-11 20:05Z) Update core documentation, mode docs, wrapper templates, and eval text so they match the full-queue loop semantics and the new structured contracts. | ||
| ## Surprises & Discoveries | ||
| - Observation: the orchestrator already has a JSON worker path, but the main prompts still tell agents to write Markdown reports. | ||
| Evidence: `scripts/run-bug-hunter.cjs` writes and reads `chunk-<id>-findings.json`, while `prompts/hunter.md` still directs output to `.bug-hunter/findings.md`. | ||
| - Observation: fix planning expects numeric confidence, but the Referee prompt still emits `High/Medium/Low`. | ||
| Evidence: `scripts/run-bug-hunter.cjs` filters fix eligibility with `confidence >= confidenceThreshold`, while `prompts/referee.md` asks for `Confidence: High/Medium/Low`. | ||
| - Observation: loop state is still a machine-parseable Markdown document, which is more brittle than the rest of the JSON-capable pipeline. | ||
| Evidence: `modes/loop.md` defines `.bug-hunter/coverage.md` with line-based sections and a checksum format instead of a JSON state file. | ||
| - Observation: evaluation fixtures still encode the earlier `CRITICAL/HIGH` stopping rule. | ||
| Evidence: `evals/evals.json` case `id: 6` still expects completion once all CRITICAL and HIGH files are done. | ||
| - Observation: once schema refs become real runtime assets, isolated skill copies must include `schemas/` as well as `scripts/`. | ||
| Evidence: the preflight isolation test needed `schemas/findings.schema.json` and the new schema helper scripts copied into the sandbox to stay representative. | ||
| - Observation: deduplicated findings now inherit the strongest numeric confidence for the shared `file|lines|claim` key, which changes low-confidence metrics compared with the previous loose merge. | ||
| Evidence: `scripts/bug-hunter-state.cjs` now validates findings before merge and keeps the maximum `confidenceScore` for duplicate keys, which required updating the state test expectation. | ||
| - Observation: the remaining validation gap closed cleanly once the runner exposed a generic schema-validated phase command instead of baking phase-specific logic into docs. | ||
| Evidence: `scripts/run-bug-hunter.cjs` now exposes `phase`, validates any named artifact after each attempt, and retries malformed Skeptic/Referee/Fix outputs before the phase succeeds. | ||
| ## Decision Log | ||
| - Decision: use provider-agnostic local JSON schemas as the source of truth, and treat provider-native structured outputs as an optimization layer. | ||
| Rationale: Bug Hunter runs across multiple agent backends and CLIs. Native structured outputs from Claude, OpenAI, and Gemini can improve reliability where available, but the skill must remain correct on backends that only support plain prompting and local validation. | ||
| Date/Author: 2026-03-11 / Codex | ||
| - Decision: keep Markdown reports, but generate them from validated JSON artifacts. | ||
| Rationale: humans still need readable reports, but machine-state should not depend on brittle line parsing or prompt formatting quirks. | ||
| Date/Author: 2026-03-11 / Codex | ||
| - Decision: normalize confidence to both `confidence_score` and `confidence_label`. | ||
| Rationale: numeric confidence is required for fix eligibility and consistency checks, while a short label remains useful for readable reports. | ||
| Date/Author: 2026-03-11 / Codex | ||
| - Decision: migrate loop state from `coverage.md` to `coverage.json` and keep a rendered `coverage.md` for visibility. | ||
| Rationale: the loop is the long-lived state carrier. It benefits the most from strict schema validation, resumability, and safe retries. | ||
| Date/Author: 2026-03-11 / Codex | ||
| - Decision: ship the schema files as package assets and treat missing schema files as a preflight failure. | ||
| Rationale: payload guards and worker validation now depend on the checked-in schema files at runtime, so an install missing `schemas/` is broken even if the scripts themselves exist. | ||
| Date/Author: 2026-03-11 / Codex | ||
| ## Outcomes & Retrospective | ||
| This migration milestone is now complete. Bug Hunter rejects malformed `findings.json` artifacts before they reach state, retries the worker when those artifacts are invalid, ships explicit schemas plus a validator CLI, renders Markdown from canonical JSON, writes canonical `coverage.json` loop state with a derived `coverage.md` companion, and now enforces Skeptic/Referee/Fixer artifact validation through the orchestrated `run-bug-hunter.cjs phase` path as well as the manual/local path. | ||
| ## Context and Orientation | ||
| Bug Hunter is a skill package rooted at `/Users/codex/.agents/skills/bug-hunter`. The important files for this work are spread across prompts, mode documents, helper scripts, and tests. | ||
| `prompts/hunter.md`, `prompts/skeptic.md`, `prompts/referee.md`, and `prompts/fixer.md` define what each analysis phase writes today. They currently emphasize Markdown output with free-form sections and line-oriented formats. This is the main place where drift enters the system. | ||
| `scripts/run-bug-hunter.cjs` is the orchestration helper that manages chunk execution, retries, delta expansion, consistency reports, and fix-plan generation. It already understands JSON findings files written by workers. This file is the best anchor for the migration because it already behaves like a JSON pipeline in the tests. | ||
| `scripts/bug-hunter-state.cjs` stores durable scan state such as chunk progress, a bug ledger, fact cards, consistency information, and fix plans. It currently records findings from JSON files, but it does not validate rich schemas and it accepts incomplete objects as long as basic fields exist. | ||
| `scripts/payload-guard.cjs` validates worker payloads before launch. Right now it only checks that required top-level fields exist and that `outputSchema` is “an object”. It does not enforce real schemas for either inbound or outbound data. | ||
| `modes/loop.md` and `modes/fix-loop.md` define the iterative audit loop. They currently store machine state in `.bug-hunter/coverage.md`, which is a Markdown file with line-based sections. That format is readable but brittle and expensive to maintain compared with JSON. | ||
| `evals/evals.json` and `scripts/tests/*.test.cjs` are the safety net. They currently prove parts of the JSON worker path, but they do not yet enforce full end-to-end structured outputs or the newly required full-queue loop semantics. | ||
| In this plan, “structured output” means a phase result that conforms to a versioned JSON schema that can be validated locally with no guesswork. “Canonical artifact” means the file every later phase trusts as the source of truth. “Rendered report” means a human-readable Markdown file generated from a validated JSON artifact. | ||
| ## Plan of Work | ||
| The work starts by defining stable versioned schemas in a new directory, `schemas/`, under the skill root. Create one schema module per artifact: `recon`, `findings`, `skeptic`, `referee`, `coverage`, `fix-report`, and any shared types such as file coverage entries, cross-reference items, STRIDE/CWE metadata, and confidence values. Use plain JSON Schema stored in `.json` files or JavaScript schema builders that output JSON Schema, but keep the final schemas serializable and versioned. Each schema must include a `schemaVersion` field. Confidence must be represented as `confidenceScore` on a numeric 0–100 scale, and optionally `confidenceLabel` derived from it for rendered reports. | ||
| Next, add a schema runtime helper under `scripts/`, for example `scripts/schema-validate.cjs`, that can validate any named artifact file and print a short machine-readable result. This helper must be used in three places: when generating payloads, when reading worker outputs, and when reading persisted loop state. Expand `scripts/payload-guard.cjs` so the role templates point to real output schemas rather than placeholder `format/version` objects. The guard should reject missing or mismatched schema names before work starts. | ||
| Then migrate the prompts. `prompts/hunter.md`, `prompts/skeptic.md`, `prompts/referee.md`, and `prompts/fixer.md` should stop treating Markdown as the primary output. Instead they should instruct the agent to write a JSON array or object to the assigned canonical path, and optionally write a rendered Markdown companion file if the assignment requests it. The JSON contract must be concrete. For example, Hunter findings must include `bugId`, `severity`, `category`, `file`, `lines`, `claim`, `evidence`, `runtimeTrigger`, `crossReferences`, and `confidenceScore`. Referee verdicts must include `verdict`, `trueSeverity`, `confidenceScore`, `confidenceLabel`, `verificationMode`, and enriched security fields where applicable. Keep the prose reasoning, but move it into explicitly typed fields such as `analysisSummary` instead of free-form blocks. | ||
| Once the prompts are changed, update the orchestrator and state layer to consume the new contracts only. In `scripts/run-bug-hunter.cjs`, treat missing worker JSON output as a hard phase failure unless the phase explicitly allows zero results via a valid empty array. Validate every worker output before recording it in state. If validation fails, journal the schema error, mark the chunk or phase as failed, and let the retry logic rerun the worker. In `scripts/bug-hunter-state.cjs`, reject findings entries that omit required fields, and enrich ledger entries with normalized keys such as `confidenceScore`, `severity`, `category`, and `verificationMode`. Do not silently continue when a result is malformed. | ||
| After the phase artifacts are stable, migrate loop state. Add a new canonical file, `.bug-hunter/coverage.json`, and make it the state the loop reads and writes. It should contain top-level metadata, file coverage entries, cumulative bugs, fix ledger entries, and the current loop status. Keep `.bug-hunter/coverage.md`, but generate it from `coverage.json` after each iteration so humans can still inspect progress. Update `modes/loop.md` and `modes/fix-loop.md` to describe the JSON state as canonical and Markdown as derived. | ||
| The provider-specific structured-output layer comes next. Add a small capability adapter under `scripts/` or `templates/` that can describe three modes: native structured output supported, native unsupported but JSON prompting available, and plain-text fallback with local validation. Do not make provider-native structured outputs mandatory for correctness. When the backend supports them, use the local schema definitions to generate provider-specific requests. For Claude this means schema-constrained output or strict tool result patterns where available. For OpenAI this means strict structured outputs using JSON Schema and handling refusals or first-schema latency explicitly. For Gemini this means `responseMimeType: application/json` with `responseSchema`. If a backend does not support native structured output, keep the prompt JSON-first and validate locally after the response. | ||
| Finally, update every test and eval path. Add tests for schema validation failures, malformed worker outputs, missing `confidenceScore`, invalid coverage state, and rendered Markdown generation from JSON. Update `evals/evals.json` to require full queued coverage semantics and the presence of canonical JSON artifacts. Keep the existing worker fixture tests, but add one fully integrated smoke path that simulates a Hunter JSON output, a Skeptic JSON output, a Referee JSON output, and the resulting fix-plan eligibility. | ||
| ## Milestones | ||
| ### Milestone 1: Define the canonical data contracts | ||
| At the end of this milestone, the repository has explicit versioned schemas for every phase artifact, and a local validator can reject malformed files deterministically. Nothing user-visible changes yet, but the implementation gains a stable foundation. This milestone is complete when a novice can run schema validation against a sample `findings.json` and see success, then remove a required field and see a validation failure with a helpful error. | ||
| ### Milestone 2: Convert prompts and orchestrator to JSON-first phase outputs | ||
| At the end of this milestone, Hunter, Skeptic, Referee, and Fixer all emit canonical JSON artifacts, and the orchestrator only accepts validated JSON for state updates. Markdown reports still exist, but they are generated from JSON. This milestone is complete when a simulated worker run produces `findings.json`, the orchestrator records it, and a malformed output fails fast with retry instead of silently succeeding. | ||
| ### Milestone 3: Migrate loop state to JSON and align semantics | ||
| At the end of this milestone, `.bug-hunter/coverage.json` is the canonical loop state, the loop uses full queued coverage semantics, and `.bug-hunter/coverage.md` is a rendered summary. This milestone is complete when a loop simulation can resume from `coverage.json`, continue through queued files, and render a readable Markdown view from the same state. | ||
| ### Milestone 4: Add provider-native structured output adapters and end-to-end safety tests | ||
| At the end of this milestone, the skill can optionally use native structured outputs for Claude, OpenAI, or Gemini capable backends, but still behaves correctly without them. The tests and evals enforce the new contracts. This milestone is complete when the provider adapter selects the correct mode, malformed outputs are rejected across all supported execution paths, and evals no longer encode the obsolete `CRITICAL/HIGH` stopping rule. | ||
| ## Concrete Steps | ||
| Work from `/Users/codex/.agents/skills/bug-hunter`. | ||
| 1. Create the schema directory and files. | ||
| mkdir -p docs/plans schemas | ||
| Add files such as: | ||
| schemas/findings.schema.json | ||
| schemas/skeptic.schema.json | ||
| schemas/referee.schema.json | ||
| schemas/coverage.schema.json | ||
| schemas/fix-report.schema.json | ||
| schemas/recon.schema.json | ||
| schemas/shared.schema.json | ||
| Expected result: the `schemas/` directory exists and each schema file includes `schemaVersion`. | ||
| 2. Add a validation helper. | ||
| Create `scripts/schema-validate.cjs` and teach it: | ||
| - how to load a schema by name | ||
| - how to validate a file path | ||
| - how to print JSON success or JSON error output | ||
| Expected result: | ||
| node scripts/schema-validate.cjs findings schemas/examples/findings.valid.json | ||
| {"ok":true,"artifact":"findings"} | ||
| node scripts/schema-validate.cjs findings schemas/examples/findings.invalid.json | ||
| {"ok":false,"artifact":"findings","errors":["missing required property: claim"]} | ||
| 3. Update `scripts/payload-guard.cjs` and `scripts/run-bug-hunter.cjs`. | ||
| Replace placeholder `outputSchema` objects with real schema references. Validate worker outputs before calling `record-findings` or any equivalent state write. | ||
| Expected result: a malformed findings file causes the chunk to fail with a schema error instead of being recorded as partial success. | ||
| 4. Update the prompts and rendered-report flow. | ||
| Change prompt files so JSON is the primary output. Add a renderer script such as `scripts/render-report.cjs` if needed. | ||
| Expected result: a run produces both JSON and Markdown, with Markdown fully derivable from JSON. | ||
| 5. Migrate loop state. | ||
| Add `coverage.json`, update `modes/loop.md` and `modes/fix-loop.md`, and render `coverage.md` from JSON. | ||
| Expected result: the loop resumes from JSON state and no longer depends on parsing Markdown line structure. | ||
| 6. Update tests and evals. | ||
| Run: | ||
| node --test scripts/tests/*.test.cjs | ||
| Add tests for malformed artifacts, missing confidence scores, bad coverage state, and rendered Markdown output. Update `evals/evals.json` so loop completion requires full queued coverage, not just CRITICAL and HIGH completion. | ||
| ## Validation and Acceptance | ||
| Acceptance is behavior-based. | ||
| First, run the script tests from `/Users/codex/.agents/skills/bug-hunter`: | ||
| node --test scripts/tests/*.test.cjs | ||
| Expect all tests to pass, including new tests that fail before the migration because the old code accepted malformed outputs or textual confidence. | ||
| Second, run a local orchestrator smoke path with a valid worker fixture. It must produce canonical JSON output files and a rendered Markdown report. Observe: | ||
| .bug-hunter/findings.json | ||
| .bug-hunter/referee.json | ||
| .bug-hunter/fix-report.json | ||
| .bug-hunter/coverage.json | ||
| .bug-hunter/report.md | ||
| Third, deliberately break one phase artifact by removing a required field such as `claim` or `confidenceScore`. Re-run the same smoke path and expect: | ||
| - the phase fails | ||
| - the journal records a schema validation error | ||
| - state is not updated from the malformed artifact | ||
| - retry logic is allowed to rerun the worker | ||
| Fourth, run a loop simulation and verify that completion only occurs when every queued scannable file is marked done in `coverage.json`, not merely when CRITICAL and HIGH files are done. | ||
| ## Idempotence and Recovery | ||
| The migration should be safe to run incrementally. Schema files and validators are additive. During implementation, keep Markdown outputs in parallel with JSON outputs until all consumers are switched over. Do not remove Markdown files until JSON-based rendering and validation are proven. | ||
| If a phase fails because of schema validation, the safe recovery path is to fix the producer prompt or fixture and rerun the same command. Because the state update happens after validation, malformed outputs should not poison the state file. | ||
| When migrating loop state, keep a one-time importer from `coverage.md` to `coverage.json` or, if that is too brittle, explicitly start fresh and document that old Markdown loop state is not resumable across the migration. Choose one path and document it in the implementation notes. | ||
| ## Artifacts and Notes | ||
| The most important implementation artifacts should be: | ||
| schemas/*.schema.json | ||
| scripts/schema-validate.cjs | ||
| scripts/render-report.cjs | ||
| .bug-hunter/*.json | ||
| .bug-hunter/report.md | ||
| .bug-hunter/coverage.md | ||
| Expected evidence after completion: | ||
| $ node scripts/schema-validate.cjs findings .bug-hunter/findings.json | ||
| {"ok":true,"artifact":"findings"} | ||
| $ node --test scripts/tests/*.test.cjs | ||
| ℹ pass <updated-count> | ||
| ℹ fail 0 | ||
| ## Interfaces and Dependencies | ||
| Define these stable interfaces by the end of the work: | ||
| In `schemas/findings.schema.json`, define a findings artifact that is an array of finding objects. Each finding object must include: | ||
| bugId: string | ||
| severity: "Critical" | "Medium" | "Low" | ||
| category: string | ||
| file: string | ||
| lines: string | ||
| claim: string | ||
| evidence: string | ||
| runtimeTrigger: string | ||
| crossReferences: array | ||
| confidenceScore: number | ||
| In `schemas/referee.schema.json`, define a verdict artifact with: | ||
| bugId: string | ||
| verdict: "REAL_BUG" | "NOT_A_BUG" | "MANUAL_REVIEW" | ||
| trueSeverity: "Critical" | "Medium" | "Low" | ||
| confidenceScore: number | ||
| confidenceLabel: string | ||
| verificationMode: "INDEPENDENTLY_VERIFIED" | "EVIDENCE_BASED" | ||
| analysisSummary: string | ||
| In `schemas/coverage.schema.json`, define loop state with: | ||
| schemaVersion: number | ||
| iteration: number | ||
| status: "IN_PROGRESS" | "COMPLETE" | ||
| files: array of file coverage entries | ||
| bugs: array of confirmed bug summaries | ||
| fixes: array of fix ledger entries | ||
| In `scripts/schema-validate.cjs`, implement a CLI with: | ||
| node scripts/schema-validate.cjs <artifact-name> <file-path> | ||
| In `scripts/render-report.cjs`, implement a CLI that renders Markdown from JSON artifacts: | ||
| node scripts/render-report.cjs report .bug-hunter/findings.json .bug-hunter/referee.json > .bug-hunter/report.md | ||
| Provider-native structured output adapters, if added, must consume these local schemas rather than inventing provider-specific contracts. | ||
| ## Change Log For This Plan | ||
| 2026-03-11: Initial ExecPlan created after the structured-output audit. The plan chooses provider-agnostic local schemas as the foundation and treats Claude/OpenAI/Gemini native structured outputs as optional accelerators rather than the source of truth. |
| # Surgical Fix Plan for Confirmed Audit Bugs | ||
| ## Objective | ||
| Fix the four confirmed runtime bugs without changing the surrounding product behavior, public UX, or broader pipeline design beyond what is necessary for correctness and safety. | ||
| Confirmed bugs: | ||
| - `BUG-1` — `scripts/run-bug-hunter.cjs` | ||
| - `BUG-2` — `scripts/pr-scope.cjs` | ||
| - `BUG-3` — `scripts/fix-lock.cjs` | ||
| - `BUG-4` — `scripts/code-index.cjs` | ||
| ## Fix order | ||
| 1. `BUG-3` `scripts/fix-lock.cjs` | ||
| 2. `BUG-4` `scripts/code-index.cjs` | ||
| 3. `BUG-2` `scripts/pr-scope.cjs` | ||
| 4. `BUG-1` `scripts/run-bug-hunter.cjs` | ||
| Rationale: | ||
| - `BUG-3` and `BUG-4` are isolated utility-level correctness fixes with low blast radius. | ||
| - `BUG-2` changes PR scope resolution behavior and needs targeted tests around fallback semantics. | ||
| - `BUG-1` touches orchestration behavior and should land last after the supporting utilities are stable. | ||
| --- | ||
| ## BUG-3 — fix-lock can steal a live lock | ||
| ### Problem | ||
| `acquire()` treats TTL expiry as sufficient proof of staleness and does not check whether the recorded PID is still alive. | ||
| ### Surgical fix | ||
| - Keep the existing lock file format. | ||
| - Change stale recovery logic so a lock is auto-recovered only when: | ||
| - TTL expired **and** | ||
| - owner PID is absent or not alive. | ||
| - If TTL expired but owner is still alive, return a failure payload such as: | ||
| - `reason: "lock-held-by-live-owner"` | ||
| - include `stale: true` and `ownerAlive: true` for observability. | ||
| ### Files | ||
| - `scripts/fix-lock.cjs` | ||
| - tests in `scripts/tests/fix-lock.test.cjs` | ||
| ### Test additions | ||
| - acquiring a fresh lock from another process still fails | ||
| - acquiring an expired lock whose PID is dead succeeds | ||
| - acquiring an expired lock whose PID is alive fails | ||
| - `status` remains consistent with acquire behavior | ||
| ### Risk | ||
| Low. Pure locking behavior change. | ||
| --- | ||
| ## BUG-4 — code-index query-bugs temp file collision | ||
| ### Problem | ||
| `queryBugs()` always writes `.seed-files.tmp.json` in the same directory and only deletes it on success. | ||
| ### Surgical fix | ||
| - Replace fixed temp filename with a unique invocation-scoped filename, e.g. based on: | ||
| - `process.pid` | ||
| - timestamp | ||
| - random suffix | ||
| - Wrap temp-file lifecycle in `try/finally` so cleanup runs even if `query()` throws. | ||
| - Preserve current command contract and output shape. | ||
| ### Files | ||
| - `scripts/code-index.cjs` | ||
| - tests in `scripts/tests/code-index.test.cjs` | ||
| ### Test additions | ||
| - `query-bugs` cleans up temp file after success | ||
| - `query-bugs` cleans up temp file after a thrown query path | ||
| - parallel invocations do not reuse the same temp file name | ||
| ### Risk | ||
| Low. Local helper behavior only. | ||
| --- | ||
| ## BUG-2 — pr-scope silent wrong-base fallback | ||
| ### Problem | ||
| For `selector === "current"`, any `gh` failure falls back to `git diff <base or main>...HEAD` and reports success. This can silently produce the wrong review scope. | ||
| ### Surgical fix | ||
| Preferred minimal behavior: | ||
| - Keep git fallback only for `current`. | ||
| - Before fallback, determine base branch more safely: | ||
| 1. explicit `--base` if supplied | ||
| 2. repo default branch if discoverable | ||
| 3. otherwise fail explicitly instead of assuming `main` | ||
| - If `gh` fails and no trustworthy base is available, return an error rather than a successful but potentially wrong scope. | ||
| ### Implementation notes | ||
| - Add a small helper to resolve default branch via git when possible, e.g. from: | ||
| - `refs/remotes/origin/HEAD` | ||
| - or another safe git source | ||
| - Do **not** broaden fallback for numbered/recent PRs. | ||
| - Preserve existing JSON output contract, but add metadata when fallback is used. | ||
| ### Files | ||
| - `scripts/pr-scope.cjs` | ||
| - tests in `scripts/tests/pr-scope.test.cjs` | ||
| ### Test additions | ||
| - `current` with explicit `--base` still falls back correctly | ||
| - `current` with discoverable default branch falls back correctly | ||
| - `current` with no trustworthy base fails explicitly | ||
| - `recent` and numbered PRs still require GitHub metadata | ||
| ### Risk | ||
| Medium. Scope-selection behavior changes and could affect user workflows, but the change is correctness-oriented and bounded. | ||
| --- | ||
| ## BUG-1 — fix strategy ignored by executable fix queue | ||
| ### Problem | ||
| `fix-strategy.json` is generated, but `buildFixPlan()` still computes eligibility directly from confidence alone. Strategy classes such as `manual-review`, `larger-refactor`, and `architectural-remediation` do not actually gate execution. | ||
| ### Surgical fix | ||
| - Keep `fix-strategy.json` as the source of truth for execution eligibility. | ||
| - Update the executable queue builder so only findings/clusters marked safe for autofix enter: | ||
| - `safe-autofix` | ||
| - and `autofixEligible === true` | ||
| - Ensure `manual-review`, `larger-refactor`, and `architectural-remediation` never flow into canary/rollout. | ||
| - Preserve current `fix-plan.json` shape as much as possible to minimize downstream breakage. | ||
| ### Recommended implementation shape | ||
| Option A, lowest risk: | ||
| - Refactor `buildFixPlan()` to accept preclassified entries from `buildFixStrategy()`. | ||
| - Derive eligible/canary/rollout only from strategy entries where `autofixEligible === true`. | ||
| Also fix cluster-stage ambiguity: | ||
| - Either include `executionStage` in the cluster grouping key, or | ||
| - compute cluster stage conservatively from all entries instead of taking `entries[0]`. | ||
| ### Files | ||
| - `scripts/run-bug-hunter.cjs` | ||
| - tests in `scripts/tests/run-bug-hunter.test.cjs` | ||
| - possibly `schemas/fix-strategy.schema.json` only if contract refinement is needed | ||
| ### Test additions | ||
| - high-confidence `architectural-remediation` finding does not enter `fixPlan.canary/rollout` | ||
| - high-confidence `larger-refactor` finding does not enter executable queue | ||
| - `safe-autofix` findings still enter canary/rollout | ||
| - mixed-stage safe-autofix entries in same directory do not collapse incorrectly | ||
| ### Risk | ||
| Medium-high. This changes executable orchestration, but still within the intended design and existing artifact model. | ||
| --- | ||
| ## Verification plan | ||
| Run after each bug fix if practical, and again at the end: | ||
| ```bash | ||
| node --test scripts/tests/*.test.cjs | ||
| ``` | ||
| Recommended focused sequence during implementation: | ||
| ```bash | ||
| node --test scripts/tests/fix-lock.test.cjs | ||
| node --test scripts/tests/code-index.test.cjs | ||
| node --test scripts/tests/pr-scope.test.cjs | ||
| node --test scripts/tests/run-bug-hunter.test.cjs | ||
| node --test scripts/tests/*.test.cjs | ||
| ``` | ||
| ## Definition of done | ||
| - [x] All 4 confirmed bugs have targeted code fixes. | ||
| - [x] Regression tests exist for each bug. | ||
| - [x] Full script test suite passes. | ||
| - [x] No public CLI contract is changed except where necessary to avoid silent wrong behavior. | ||
| - [x] `fix-strategy` becomes behaviorally authoritative for execution gating, not just informational. | ||
| ## Outcome | ||
| Implemented and verified on 2026-03-12. | ||
| Fresh verification evidence: | ||
| ```bash | ||
| node --test scripts/tests/*.test.cjs | ||
| ``` | ||
| Result: 44/44 tests passing. |
| # Enterprise Security Pack End-to-End Integration Plan | ||
| ## Objective | ||
| Make Bug Hunter's bundled local security skills fully end-to-end connected, portable, and enterprise-grade. | ||
| The bundled local skills already exist under `skills/`, but the main Bug Hunter orchestration flow does not yet actively route into them. This plan closes that gap by wiring the main `SKILL.md`, documentation, tests, and evals so the companion skills are not just packaged assets — they become part of the operating system of the product. | ||
| ## Target outcomes | ||
| 1. Main Bug Hunter flow explicitly routes into bundled local security skills when relevant. | ||
| 2. Security entrypoints are easy to invoke and enterprise-friendly. | ||
| 3. Docs, tests, and evals all reflect the integrated flow. | ||
| 4. The repository remains fully portable with no external marketplace dependency. | ||
| 5. After integration, run a focused Bug Hunter audit on the repository, fix any real bugs found, and summarize the net result. | ||
| ## Integration model | ||
| Bug Hunter remains the top-level orchestrator. | ||
| Bundled local skills become capability modules: | ||
| - `skills/commit-security-scan/` → diff-scoped PR/commit/staged security review | ||
| - `skills/security-review/` → full security workflow (threat model + code + deps + validation) | ||
| - `skills/threat-model-generation/` → authoritative threat model bootstrap/refresh | ||
| - `skills/vulnerability-validation/` → exploitability/reachability/CVSS/PoC validation for security findings | ||
| The main skill should load these on demand from local paths and keep all artifacts under `.bug-hunter/`. | ||
| ## Work plan | ||
| ### Milestone 1 — Main skill routing | ||
| - Add security-oriented flags and aliases to `SKILL.md` / `README.md` | ||
| - Add explicit routing rules for when to read bundled local security skills | ||
| - Make threat model generation explicitly delegate to bundled `threat-model-generation` | ||
| - Make PR security review explicitly delegate to bundled `commit-security-scan` | ||
| - Make severe security validation explicitly delegate to bundled `vulnerability-validation` | ||
| - Make full security audit explicitly delegate to bundled `security-review` | ||
| ### Milestone 2 — Enterprise UX surface | ||
| - Add enterprise-grade usage examples and a security-pack section in docs | ||
| - Keep behavior portable and artifact-native (`.bug-hunter/*` only) | ||
| ### Milestone 3 — Guardrails | ||
| - Add regression tests proving the main skill references and exposes the bundled skills | ||
| - Add evals for the new end-to-end security flows | ||
| ### Milestone 4 — Cross verification and self-audit | ||
| - Run the full script test suite | ||
| - Run a focused Bug Hunter audit on the repository | ||
| - Fix any real bugs uncovered by that audit | ||
| - Summarize all shipped changes briefly | ||
| ## Definition of done | ||
| - Main `SKILL.md` actively routes to the bundled local security skills | ||
| - `README.md` documents the integrated security pack as a real workflow, not just a packaged extra | ||
| - tests and evals cover the integrated paths | ||
| - full test suite passes | ||
| - self-audit completes and any confirmed bugs are fixed |
| # Local Security Skills Integration Plan | ||
| ## Objective | ||
| Vendor the security-engineer marketplace capabilities into Bug Hunter as local, portable companion skills so the repository is self-contained and does not depend on external machine-specific skill paths. | ||
| Target local skills: | ||
| - `skills/commit-security-scan/` | ||
| - `skills/security-review/` | ||
| - `skills/threat-model-generation/` | ||
| - `skills/vulnerability-validation/` | ||
| ## Design | ||
| Use Bug Hunter as the orchestrator and package the imported capabilities as local skills with Bug Hunter-native artifact paths and schemas. | ||
| Principles: | ||
| - No references to `.factory/` or external marketplace paths | ||
| - Reuse Bug Hunter-native artifacts under `.bug-hunter/` | ||
| - Keep skill bodies focused on capability/workflow; keep runtime logic in existing prompts/scripts | ||
| - Make the new skills portable by including them in the package `files` list and documenting them in the repo | ||
| ## Work items | ||
| 1. Create local skill directories with adapted `SKILL.md` files | ||
| 2. Point all skill outputs/inputs to `.bug-hunter/*` artifacts and existing Bug Hunter concepts | ||
| 3. Add a packaging/regression test to verify the local skills are present and packaged | ||
| 4. Add `skills/` to `package.json` publish files | ||
| 5. Document the bundled companion skills in `README.md` | ||
| 6. Update `CHANGELOG.md` | ||
| 7. Run tests | ||
| ## Definition of done | ||
| - `skills/` exists with the four local security skills | ||
| - no vendored skill references point to `.factory/` paths | ||
| - package metadata includes `skills/` | ||
| - tests verify the packaged skills exist | ||
| - docs explain the bundled local security pack |
| # PR Review + Strategic Fix Flow | ||
| This ExecPlan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must stay current as work lands. | ||
| ## Purpose / Big Picture | ||
| Bug Hunter already has the ingredients for branch-diff review and safe fix execution, but two user-facing workflows are still underpowered: | ||
| 1. **Review a recent PR directly** without requiring the user to manually map a PR to a branch/base diff. | ||
| 2. **Plan fixes strategically before editing code** so the tool can distinguish safe autofixes from larger remediation work. | ||
| After this change, Bug Hunter should support a first-class PR review flow and a first-class fix-strategy flow. A user should be able to run a PR-focused review against the current, recent, or numbered PR, and the tool should produce PR-specific metadata plus a focused review artifact. When bugs are confirmed, the tool should create a machine-readable fix strategy before the fixer phase starts, making the plan visible and auditable. | ||
| ## Progress | ||
| - [x] (2026-03-12 06:58Z) Audit the current codebase to confirm existing branch-diff support, fix-plan behavior, and the lack of first-class PR and strategy flows. | ||
| - [x] (2026-03-12 07:23Z) Add `scripts/pr-scope.cjs` plus tests covering `current`, `recent`, explicit numbered PR failure behavior, and git fallback for current-branch review. | ||
| - [x] (2026-03-12 07:24Z) Extend `README.md` and `SKILL.md` with first-class PR-review flags and `--plan-only` strategy-first usage. | ||
| - [x] (2026-03-12 07:26Z) Add canonical `fix-strategy` schema/runtime support plus Markdown rendering. | ||
| - [x] (2026-03-12 07:28Z) Generate `fix-strategy.json` and `fix-strategy.md` from `scripts/run-bug-hunter.cjs` before fix execution. | ||
| - [x] (2026-03-12 07:30Z) Update fix pipeline docs and fixer prompt language so strategy is explicit before patching. | ||
| - [x] (2026-03-12 07:33Z) Run `node --test scripts/tests/*.test.cjs` successfully (39/39 passing). | ||
| ## Surprises & Discoveries | ||
| - Observation: branch-diff review is already documented and partially supported, but it is branch-centric rather than PR-centric. | ||
| Evidence: `README.md` and `SKILL.md` support `-b <branch>` and `--staged`, but there is no `--pr` or `--review-pr` workflow. | ||
| - Observation: the documented fix pipeline is more strategic than the current code-level planner. | ||
| Evidence: `modes/fix-pipeline.md` describes dependency ordering, canary rollout, and circuit breaking, while `scripts/run-bug-hunter.cjs` currently builds a fix plan mostly from confidence/severity sorting plus canary slicing. | ||
| - Observation: the packaged skill copy is not a git checkout. | ||
| Evidence: `git status` fails in the working directory, so Ralph-loop safety assumptions about git history do not fully apply here. | ||
| ## Decision Log | ||
| - Decision: implement PR review as a helper-script-driven scope resolver instead of encoding GitHub logic directly into `SKILL.md` prose. | ||
| Rationale: the resolver is testable, reusable from docs/prompt flows, and lets the prompt stay focused on behavior rather than shell branching. | ||
| Date/Author: 2026-03-12 / Codex | ||
| - Decision: represent strategy as a canonical JSON artifact (`fix-strategy.json`) alongside the existing fix plan. | ||
| Rationale: strategy needs to be inspectable and machine-validated, not embedded as prose in reports. | ||
| Date/Author: 2026-03-12 / Codex | ||
| - Decision: keep the existing fix plan artifact, but enrich the pipeline with a prior strategy artifact rather than replacing the whole fix planner. | ||
| Rationale: this minimizes risk and preserves the existing verification/test harness. | ||
| Date/Author: 2026-03-12 / Codex | ||
| ## Outcomes & Retrospective | ||
| This implementation landed the intended end-to-end flow. Bug Hunter now has a reusable PR scope resolver (`scripts/pr-scope.cjs`) that turns `current`, `recent`, or explicit PR references into normalized file scope, with a safe git fallback for current-branch review when GitHub metadata is unavailable. The core orchestrator now emits `fix-strategy.json` and `fix-strategy.md` before fix execution, giving users a visible strategy layer ahead of the existing fix plan. | ||
| The work stayed low-risk because it extended existing artifacts rather than replacing them. `fix-plan.json` still drives rollout/canary handling, while `fix-strategy.json` adds the missing classification layer for safe autofix vs manual review vs larger remediation. The full automated test suite passed after the changes. | ||
| ## Context and Orientation | ||
| Relevant files for this effort: | ||
| - `SKILL.md` — user-facing orchestration instructions and argument parsing behavior. | ||
| - `README.md` — public product surface and examples. | ||
| - `scripts/run-bug-hunter.cjs` — orchestrator and fix-plan generation. | ||
| - `scripts/render-report.cjs` — human-readable report rendering from canonical JSON. | ||
| - `scripts/payload-guard.cjs` and `scripts/schema-runtime.cjs` — schema/runtime plumbing. | ||
| - `modes/fix-pipeline.md` — documented fix flow. | ||
| - `scripts/tests/run-bug-hunter.test.cjs` — orchestration safety net. | ||
| ## Plan of Work | ||
| ### Milestone 1: PR review scope resolution | ||
| Create a helper script that resolves PR input into a normalized review scope. It should support `current`, `recent`, and numeric PR references. When GitHub CLI metadata is available, it should return PR number, title, head branch, base branch, and changed files. When GitHub CLI is unavailable but the request targets the current branch, it should fall back to git-based branch diff metadata where possible. | ||
| ### Milestone 2: Strategy artifact generation | ||
| Add a canonical `fix-strategy` artifact that groups confirmed bugs into execution-oriented clusters and classifies them as safe autofix, manual review, larger refactor, or architectural remediation. Generate this artifact inside the orchestrator after findings have been normalized and before fix execution. | ||
| ### Milestone 3: Prompt and documentation alignment | ||
| Update `SKILL.md`, `README.md`, and fix-pipeline docs so the new flows are explicit: PR review is first-class, and fix execution is preceded by an explicit strategy phase. | ||
| ### Milestone 4: Validation | ||
| Add tests for PR scope resolution and orchestrator strategy generation. Run the existing test suite to guard against regressions. |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "coverage", | ||
| "title": "Bug Hunter Coverage Artifact", | ||
| "type": "object", | ||
| "required": ["schemaVersion", "iteration", "status", "files", "bugs", "fixes"], | ||
| "properties": { | ||
| "schemaVersion": { | ||
| "type": "integer", | ||
| "minimum": 1 | ||
| }, | ||
| "iteration": { | ||
| "type": "integer", | ||
| "minimum": 0 | ||
| }, | ||
| "status": { | ||
| "type": "string", | ||
| "enum": ["IN_PROGRESS", "COMPLETE"] | ||
| }, | ||
| "files": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["path", "status"], | ||
| "properties": { | ||
| "path": { "type": "string", "minLength": 1 }, | ||
| "status": { | ||
| "type": "string", | ||
| "enum": ["pending", "in_progress", "done", "failed"] | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| }, | ||
| "bugs": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["bugId", "severity", "file", "claim"], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "severity": { | ||
| "type": "string", | ||
| "enum": ["Critical", "Medium", "Low"] | ||
| }, | ||
| "file": { "type": "string", "minLength": 1 }, | ||
| "claim": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| }, | ||
| "fixes": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["bugId", "status"], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "status": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } |
| [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "severity": "Critical", | ||
| "category": "security", | ||
| "file": "src/example.ts", | ||
| "lines": "12-16", | ||
| "evidence": "src/example.ts:12-16 unvalidated body flows into exec().", | ||
| "runtimeTrigger": "POST /api/example with body {\"command\":\"rm -rf /\"}", | ||
| "crossReferences": ["src/router.ts:8-14"], | ||
| "confidenceScore": 92 | ||
| } | ||
| ] |
| [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "severity": "Critical", | ||
| "category": "security", | ||
| "file": "src/example.ts", | ||
| "lines": "12-16", | ||
| "claim": "Request body reaches a dangerous sink without validation.", | ||
| "evidence": "src/example.ts:12-16 unvalidated body flows into exec().", | ||
| "runtimeTrigger": "POST /api/example with body {\"command\":\"rm -rf /\"}", | ||
| "crossReferences": ["src/router.ts:8-14"], | ||
| "confidenceScore": 92, | ||
| "confidenceLabel": "high", | ||
| "stride": "Tampering", | ||
| "cwe": "CWE-78" | ||
| } | ||
| ] |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "findings", | ||
| "title": "Bug Hunter Findings Artifact", | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": [ | ||
| "bugId", | ||
| "severity", | ||
| "category", | ||
| "file", | ||
| "lines", | ||
| "claim", | ||
| "evidence", | ||
| "runtimeTrigger", | ||
| "crossReferences", | ||
| "confidenceScore" | ||
| ], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "severity": { | ||
| "type": "string", | ||
| "enum": ["Critical", "Medium", "Low"] | ||
| }, | ||
| "category": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "logic", | ||
| "security", | ||
| "error-handling", | ||
| "concurrency", | ||
| "edge-case", | ||
| "data-integrity", | ||
| "type-safety", | ||
| "resource-leak", | ||
| "api-contract", | ||
| "cross-file" | ||
| ] | ||
| }, | ||
| "file": { "type": "string", "minLength": 1 }, | ||
| "lines": { "type": "string", "minLength": 1 }, | ||
| "claim": { "type": "string", "minLength": 1 }, | ||
| "evidence": { "type": "string", "minLength": 1 }, | ||
| "runtimeTrigger": { "type": "string", "minLength": 1 }, | ||
| "crossReferences": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "confidenceScore": { | ||
| "type": "number", | ||
| "minimum": 0, | ||
| "maximum": 100 | ||
| }, | ||
| "confidenceLabel": { | ||
| "type": "string", | ||
| "enum": ["high", "medium", "low"] | ||
| }, | ||
| "stride": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "Spoofing", | ||
| "Tampering", | ||
| "Repudiation", | ||
| "InfoDisclosure", | ||
| "DoS", | ||
| "ElevationOfPrivilege", | ||
| "N/A" | ||
| ] | ||
| }, | ||
| "cwe": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "fix-plan", | ||
| "title": "Bug Hunter Fix Plan Artifact", | ||
| "type": "object", | ||
| "required": [ | ||
| "generatedAt", | ||
| "confidenceThreshold", | ||
| "canarySize", | ||
| "totals", | ||
| "canary", | ||
| "rollout", | ||
| "manualReview" | ||
| ], | ||
| "properties": { | ||
| "generatedAt": { "type": "string", "minLength": 1 }, | ||
| "confidenceThreshold": { "type": "integer", "minimum": 1 }, | ||
| "canarySize": { "type": "integer", "minimum": 1 }, | ||
| "totals": { | ||
| "type": "object", | ||
| "required": ["findings", "eligible", "canary", "rollout", "manualReview"], | ||
| "properties": { | ||
| "findings": { "type": "integer", "minimum": 0 }, | ||
| "eligible": { "type": "integer", "minimum": 0 }, | ||
| "canary": { "type": "integer", "minimum": 0 }, | ||
| "rollout": { "type": "integer", "minimum": 0 }, | ||
| "manualReview": { "type": "integer", "minimum": 0 } | ||
| }, | ||
| "additionalProperties": false | ||
| }, | ||
| "canary": { "$ref": "#/definitions/fixPlanEntries" }, | ||
| "rollout": { "$ref": "#/definitions/fixPlanEntries" }, | ||
| "manualReview": { "$ref": "#/definitions/fixPlanEntries" } | ||
| }, | ||
| "definitions": { | ||
| "fixPlanEntries": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": [ | ||
| "bugId", | ||
| "severity", | ||
| "category", | ||
| "file", | ||
| "lines", | ||
| "claim", | ||
| "evidence", | ||
| "runtimeTrigger", | ||
| "crossReferences", | ||
| "confidenceScore", | ||
| "strategy", | ||
| "executionStage", | ||
| "autofixEligible", | ||
| "reason" | ||
| ], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "severity": { "type": "string", "enum": ["Critical", "Medium", "Low"] }, | ||
| "category": { "type": "string", "minLength": 1 }, | ||
| "file": { "type": "string", "minLength": 1 }, | ||
| "lines": { "type": "string", "minLength": 1 }, | ||
| "claim": { "type": "string", "minLength": 1 }, | ||
| "evidence": { "type": "string", "minLength": 1 }, | ||
| "runtimeTrigger": { "type": "string", "minLength": 1 }, | ||
| "crossReferences": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "confidenceScore": { "type": "number", "minimum": 0, "maximum": 100 }, | ||
| "confidenceLabel": { "type": "string", "enum": ["high", "medium", "low"] }, | ||
| "stride": { "type": "string", "minLength": 1 }, | ||
| "cwe": { "type": "string", "minLength": 1 }, | ||
| "key": { "type": "string", "minLength": 1 }, | ||
| "status": { "type": "string", "minLength": 1 }, | ||
| "source": { "type": "string", "minLength": 1 }, | ||
| "updatedAt": { "type": "string", "minLength": 1 }, | ||
| "strategy": { | ||
| "type": "string", | ||
| "enum": ["safe-autofix", "manual-review", "larger-refactor", "architectural-remediation"] | ||
| }, | ||
| "executionStage": { | ||
| "type": "string", | ||
| "enum": ["canary", "rollout", "manual-review", "report-only"] | ||
| }, | ||
| "autofixEligible": { "type": "boolean" }, | ||
| "reason": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "fix-report", | ||
| "title": "Bug Hunter Fix Report Artifact", | ||
| "type": "object", | ||
| "required": [ | ||
| "version", | ||
| "fix_branch", | ||
| "base_commit", | ||
| "dry_run", | ||
| "circuit_breaker_tripped", | ||
| "phase2_timeout_hit", | ||
| "fixes", | ||
| "verification", | ||
| "summary" | ||
| ], | ||
| "properties": { | ||
| "version": { "type": "string", "minLength": 1 }, | ||
| "fix_branch": { "type": "string", "minLength": 1 }, | ||
| "base_commit": { "type": "string", "minLength": 1 }, | ||
| "dry_run": { "type": "boolean" }, | ||
| "circuit_breaker_tripped": { "type": "boolean" }, | ||
| "phase2_timeout_hit": { "type": "boolean" }, | ||
| "fixes": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["bugId", "severity", "status", "files", "lines"], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "severity": { | ||
| "type": "string", | ||
| "enum": ["CRITICAL", "HIGH", "MEDIUM", "LOW", "Critical", "Medium", "Low"] | ||
| }, | ||
| "status": { "type": "string", "minLength": 1 }, | ||
| "files": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "lines": { "type": "string", "minLength": 1 }, | ||
| "commit": { "type": "string", "minLength": 1 }, | ||
| "description": { "type": "string", "minLength": 1 }, | ||
| "reason": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| }, | ||
| "verification": { | ||
| "type": "object", | ||
| "required": [ | ||
| "baseline_pass", | ||
| "baseline_fail", | ||
| "flaky_tests", | ||
| "final_pass", | ||
| "final_fail", | ||
| "new_failures", | ||
| "resolved_failures", | ||
| "typecheck_pass", | ||
| "build_pass", | ||
| "fixer_bugs_found" | ||
| ], | ||
| "properties": { | ||
| "baseline_pass": { "type": "integer", "minimum": 0 }, | ||
| "baseline_fail": { "type": "integer", "minimum": 0 }, | ||
| "flaky_tests": { "type": "integer", "minimum": 0 }, | ||
| "final_pass": { "type": "integer", "minimum": 0 }, | ||
| "final_fail": { "type": "integer", "minimum": 0 }, | ||
| "new_failures": { "type": "integer", "minimum": 0 }, | ||
| "resolved_failures": { "type": "integer", "minimum": 0 }, | ||
| "typecheck_pass": { "type": "boolean" }, | ||
| "build_pass": { "type": "boolean" }, | ||
| "fixer_bugs_found": { "type": "integer", "minimum": 0 } | ||
| }, | ||
| "additionalProperties": false | ||
| }, | ||
| "summary": { | ||
| "type": "object", | ||
| "required": [ | ||
| "total_confirmed", | ||
| "eligible", | ||
| "manual_review", | ||
| "fixed", | ||
| "fix_reverted", | ||
| "fix_failed", | ||
| "skipped", | ||
| "fixer_bug", | ||
| "partial" | ||
| ], | ||
| "properties": { | ||
| "total_confirmed": { "type": "integer", "minimum": 0 }, | ||
| "eligible": { "type": "integer", "minimum": 0 }, | ||
| "manual_review": { "type": "integer", "minimum": 0 }, | ||
| "fixed": { "type": "integer", "minimum": 0 }, | ||
| "fix_reverted": { "type": "integer", "minimum": 0 }, | ||
| "fix_failed": { "type": "integer", "minimum": 0 }, | ||
| "skipped": { "type": "integer", "minimum": 0 }, | ||
| "fixer_bug": { "type": "integer", "minimum": 0 }, | ||
| "partial": { "type": "integer", "minimum": 0 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "fix-strategy", | ||
| "title": "Bug Hunter Fix Strategy Artifact", | ||
| "type": "object", | ||
| "required": [ | ||
| "version", | ||
| "generatedAt", | ||
| "confidenceThreshold", | ||
| "summary", | ||
| "clusters" | ||
| ], | ||
| "properties": { | ||
| "version": { "type": "string", "minLength": 1 }, | ||
| "generatedAt": { "type": "string", "minLength": 1 }, | ||
| "confidenceThreshold": { "type": "integer", "minimum": 1 }, | ||
| "summary": { | ||
| "type": "object", | ||
| "required": [ | ||
| "confirmed", | ||
| "safeAutofix", | ||
| "manualReview", | ||
| "largerRefactor", | ||
| "architecturalRemediation", | ||
| "canaryCandidates", | ||
| "rolloutCandidates" | ||
| ], | ||
| "properties": { | ||
| "confirmed": { "type": "integer", "minimum": 0 }, | ||
| "safeAutofix": { "type": "integer", "minimum": 0 }, | ||
| "manualReview": { "type": "integer", "minimum": 0 }, | ||
| "largerRefactor": { "type": "integer", "minimum": 0 }, | ||
| "architecturalRemediation": { "type": "integer", "minimum": 0 }, | ||
| "canaryCandidates": { "type": "integer", "minimum": 0 }, | ||
| "rolloutCandidates": { "type": "integer", "minimum": 0 } | ||
| }, | ||
| "additionalProperties": false | ||
| }, | ||
| "clusters": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": [ | ||
| "clusterId", | ||
| "strategy", | ||
| "executionStage", | ||
| "autofixEligible", | ||
| "bugIds", | ||
| "files", | ||
| "maxSeverity", | ||
| "summary", | ||
| "recommendedAction", | ||
| "reasons" | ||
| ], | ||
| "properties": { | ||
| "clusterId": { "type": "string", "minLength": 1 }, | ||
| "strategy": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "safe-autofix", | ||
| "manual-review", | ||
| "larger-refactor", | ||
| "architectural-remediation" | ||
| ] | ||
| }, | ||
| "executionStage": { | ||
| "type": "string", | ||
| "enum": ["canary", "rollout", "manual-review", "report-only"] | ||
| }, | ||
| "autofixEligible": { "type": "boolean" }, | ||
| "bugIds": { | ||
| "type": "array", | ||
| "minItems": 1, | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "files": { | ||
| "type": "array", | ||
| "minItems": 1, | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "maxSeverity": { | ||
| "type": "string", | ||
| "enum": ["CRITICAL", "HIGH", "MEDIUM", "LOW", "Critical", "High", "Medium", "Low"] | ||
| }, | ||
| "summary": { "type": "string", "minLength": 1 }, | ||
| "recommendedAction": { "type": "string", "minLength": 1 }, | ||
| "reasons": { | ||
| "type": "array", | ||
| "minItems": 1, | ||
| "items": { "type": "string", "minLength": 1 } | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "recon", | ||
| "title": "Bug Hunter Recon Artifact", | ||
| "type": "object", | ||
| "required": ["critical", "high", "medium", "contextOnly"], | ||
| "properties": { | ||
| "critical": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "high": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "medium": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "contextOnly": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "notes": { | ||
| "type": "array", | ||
| "items": { "type": "string", "minLength": 1 } | ||
| } | ||
| }, | ||
| "additionalProperties": false | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "referee", | ||
| "title": "Bug Hunter Referee Artifact", | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": [ | ||
| "bugId", | ||
| "verdict", | ||
| "trueSeverity", | ||
| "confidenceScore", | ||
| "confidenceLabel", | ||
| "verificationMode", | ||
| "analysisSummary" | ||
| ], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "verdict": { | ||
| "type": "string", | ||
| "enum": ["REAL_BUG", "NOT_A_BUG", "MANUAL_REVIEW"] | ||
| }, | ||
| "trueSeverity": { | ||
| "type": "string", | ||
| "enum": ["Critical", "Medium", "Low"] | ||
| }, | ||
| "confidenceScore": { | ||
| "type": "number", | ||
| "minimum": 0, | ||
| "maximum": 100 | ||
| }, | ||
| "confidenceLabel": { | ||
| "type": "string", | ||
| "enum": ["high", "medium", "low"] | ||
| }, | ||
| "verificationMode": { | ||
| "type": "string", | ||
| "enum": ["INDEPENDENTLY_VERIFIED", "EVIDENCE_BASED"] | ||
| }, | ||
| "analysisSummary": { "type": "string", "minLength": 1 }, | ||
| "suggestedFix": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "shared", | ||
| "title": "Bug Hunter Shared Definitions", | ||
| "$defs": { | ||
| "severity": { | ||
| "type": "string", | ||
| "enum": ["Critical", "Medium", "Low"] | ||
| }, | ||
| "category": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "logic", | ||
| "security", | ||
| "error-handling", | ||
| "concurrency", | ||
| "edge-case", | ||
| "data-integrity", | ||
| "type-safety", | ||
| "resource-leak", | ||
| "api-contract", | ||
| "cross-file" | ||
| ] | ||
| }, | ||
| "stride": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "Spoofing", | ||
| "Tampering", | ||
| "Repudiation", | ||
| "InfoDisclosure", | ||
| "DoS", | ||
| "ElevationOfPrivilege", | ||
| "N/A" | ||
| ] | ||
| }, | ||
| "verificationMode": { | ||
| "type": "string", | ||
| "enum": ["INDEPENDENTLY_VERIFIED", "EVIDENCE_BASED"] | ||
| }, | ||
| "confidenceLabel": { | ||
| "type": "string", | ||
| "enum": ["high", "medium", "low"] | ||
| }, | ||
| "coverageStatus": { | ||
| "type": "string", | ||
| "enum": ["pending", "in_progress", "done", "failed"] | ||
| } | ||
| } | ||
| } |
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "schemaVersion": 1, | ||
| "artifact": "skeptic", | ||
| "title": "Bug Hunter Skeptic Artifact", | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["bugId", "response", "analysisSummary"], | ||
| "properties": { | ||
| "bugId": { "type": "string", "minLength": 1 }, | ||
| "response": { | ||
| "type": "string", | ||
| "enum": ["ACCEPT", "DISPROVE", "MANUAL_REVIEW"] | ||
| }, | ||
| "analysisSummary": { "type": "string", "minLength": 1 }, | ||
| "counterEvidence": { "type": "string", "minLength": 1 } | ||
| }, | ||
| "additionalProperties": false | ||
| } | ||
| } |
| #!/usr/bin/env node | ||
| const childProcess = require('child_process'); | ||
| const path = require('path'); | ||
| function usage() { | ||
| console.error('Usage:'); | ||
| console.error(' pr-scope.cjs resolve <current|recent|pr-number> [--repo-root <path>] [--base <branch>] [--gh-bin <path>] [--git-bin <path>]'); | ||
| } | ||
| function parseOptions(argv) { | ||
| const options = {}; | ||
| let index = 0; | ||
| while (index < argv.length) { | ||
| const token = argv[index]; | ||
| if (!token.startsWith('--')) { | ||
| index += 1; | ||
| continue; | ||
| } | ||
| const key = token.slice(2); | ||
| const value = argv[index + 1]; | ||
| if (!value || value.startsWith('--')) { | ||
| options[key] = 'true'; | ||
| index += 1; | ||
| continue; | ||
| } | ||
| options[key] = value; | ||
| index += 2; | ||
| } | ||
| return options; | ||
| } | ||
| function runJson(bin, args, cwd) { | ||
| const result = childProcess.spawnSync(bin, args, { | ||
| encoding: 'utf8', | ||
| cwd | ||
| }); | ||
| if (result.status !== 0) { | ||
| const stderr = (result.stderr || '').trim(); | ||
| const stdout = (result.stdout || '').trim(); | ||
| throw new Error(stderr || stdout || `${bin} ${args.join(' ')} failed`); | ||
| } | ||
| const output = (result.stdout || '').trim(); | ||
| return output ? JSON.parse(output) : null; | ||
| } | ||
| function runLines(bin, args, cwd) { | ||
| const result = childProcess.spawnSync(bin, args, { | ||
| encoding: 'utf8', | ||
| cwd | ||
| }); | ||
| if (result.status !== 0) { | ||
| const stderr = (result.stderr || '').trim(); | ||
| const stdout = (result.stdout || '').trim(); | ||
| throw new Error(stderr || stdout || `${bin} ${args.join(' ')} failed`); | ||
| } | ||
| return (result.stdout || '') | ||
| .split(/\r?\n/) | ||
| .map((line) => line.trim()) | ||
| .filter(Boolean); | ||
| } | ||
| function ghMetadataSelector(selector) { | ||
| if (selector === 'current') { | ||
| return []; | ||
| } | ||
| return [String(selector)]; | ||
| } | ||
| function resolveWithGh({ selector, ghBin, cwd }) { | ||
| if (selector === 'recent') { | ||
| const list = runJson(ghBin, [ | ||
| 'pr', | ||
| 'list', | ||
| '--limit', | ||
| '1', | ||
| '--state', | ||
| 'open', | ||
| '--json', | ||
| 'number,title,headRefName,baseRefName,url' | ||
| ], cwd); | ||
| const pr = Array.isArray(list) ? list[0] : null; | ||
| if (!pr) { | ||
| throw new Error('No recent pull requests found'); | ||
| } | ||
| const changedFiles = runLines(ghBin, ['pr', 'diff', String(pr.number), '--name-only'], cwd); | ||
| return { | ||
| ok: true, | ||
| source: 'gh', | ||
| selector, | ||
| pr, | ||
| changedFiles | ||
| }; | ||
| } | ||
| const pr = runJson(ghBin, [ | ||
| 'pr', | ||
| 'view', | ||
| ...ghMetadataSelector(selector), | ||
| '--json', | ||
| 'number,title,headRefName,baseRefName,url' | ||
| ], cwd); | ||
| const changedFiles = runLines(ghBin, ['pr', 'diff', ...ghMetadataSelector(selector), '--name-only'], cwd); | ||
| return { | ||
| ok: true, | ||
| source: 'gh', | ||
| selector, | ||
| pr, | ||
| changedFiles | ||
| }; | ||
| } | ||
| function resolveDefaultBaseBranch({ gitBin, cwd, explicitBase }) { | ||
| if (explicitBase) { | ||
| return { baseRefName: explicitBase, diffBaseRef: explicitBase }; | ||
| } | ||
| const symbolicRef = runLines(gitBin, ['symbolic-ref', 'refs/remotes/origin/HEAD'], cwd)[0]; | ||
| const match = symbolicRef && symbolicRef.match(/^refs\/remotes\/origin\/(.+)$/); | ||
| if (match && match[1]) { | ||
| return { baseRefName: match[1], diffBaseRef: `origin/${match[1]}` }; | ||
| } | ||
| throw new Error('Unable to determine default base branch for git fallback'); | ||
| } | ||
| function resolveWithGitFallback({ gitBin, cwd, base }) { | ||
| const headRefName = runLines(gitBin, ['rev-parse', '--abbrev-ref', 'HEAD'], cwd)[0]; | ||
| const { baseRefName, diffBaseRef } = resolveDefaultBaseBranch({ gitBin, cwd, explicitBase: base }); | ||
| const changedFiles = runLines(gitBin, ['diff', '--name-only', `${diffBaseRef}...${headRefName}`], cwd); | ||
| return { | ||
| ok: true, | ||
| source: 'git', | ||
| selector: 'current', | ||
| pr: { | ||
| number: null, | ||
| title: `Current branch diff (${headRefName} vs ${baseRefName})`, | ||
| headRefName, | ||
| baseRefName, | ||
| url: null | ||
| }, | ||
| changedFiles | ||
| }; | ||
| } | ||
| function resolveScope({ selector, options }) { | ||
| const cwd = path.resolve(options['repo-root'] || process.cwd()); | ||
| const ghBin = options['gh-bin'] || process.env.BUG_HUNTER_GH_BIN || 'gh'; | ||
| const gitBin = options['git-bin'] || process.env.BUG_HUNTER_GIT_BIN || 'git'; | ||
| const base = options.base || null; | ||
| try { | ||
| return resolveWithGh({ selector, ghBin, cwd }); | ||
| } catch (error) { | ||
| if (selector !== 'current') { | ||
| throw error; | ||
| } | ||
| const fallback = resolveWithGitFallback({ gitBin, cwd, base }); | ||
| fallback.fallbackReason = error instanceof Error ? error.message : String(error); | ||
| return fallback; | ||
| } | ||
| } | ||
| function main() { | ||
| const [command, selector, ...rest] = process.argv.slice(2); | ||
| if (command !== 'resolve' || !selector) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| const options = parseOptions(rest); | ||
| const result = resolveScope({ selector, options }); | ||
| console.log(JSON.stringify(result, null, 2)); | ||
| } | ||
| try { | ||
| main(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| console.error(message); | ||
| process.exit(1); | ||
| } |
| #!/usr/bin/env node | ||
| const fs = require('fs'); | ||
| const path = require('path'); | ||
| function readJson(filePath) { | ||
| return JSON.parse(fs.readFileSync(filePath, 'utf8')); | ||
| } | ||
| function usage() { | ||
| console.error('Usage:'); | ||
| console.error(' render-report.cjs report <findings-json> <referee-json>'); | ||
| console.error(' render-report.cjs coverage <coverage-json>'); | ||
| console.error(' render-report.cjs skeptic <skeptic-json>'); | ||
| console.error(' render-report.cjs referee <referee-json>'); | ||
| console.error(' render-report.cjs fix-report <fix-report-json>'); | ||
| console.error(' render-report.cjs fix-strategy <fix-strategy-json>'); | ||
| } | ||
| function toArray(value) { | ||
| return Array.isArray(value) ? value : []; | ||
| } | ||
| function renderReport({ findingsPath, refereePath }) { | ||
| const findings = toArray(readJson(findingsPath)); | ||
| const verdicts = toArray(readJson(refereePath)); | ||
| const findingByBugId = new Map(findings.map((finding) => [finding.bugId, finding])); | ||
| const confirmed = []; | ||
| const dismissed = []; | ||
| const manualReview = []; | ||
| for (const verdict of verdicts) { | ||
| const finding = findingByBugId.get(verdict.bugId) || null; | ||
| const row = { verdict, finding }; | ||
| if (verdict.verdict === 'REAL_BUG') { | ||
| confirmed.push(row); | ||
| continue; | ||
| } | ||
| if (verdict.verdict === 'MANUAL_REVIEW') { | ||
| manualReview.push(row); | ||
| continue; | ||
| } | ||
| dismissed.push(row); | ||
| } | ||
| const lines = [ | ||
| '# Bug Hunter Report', | ||
| '', | ||
| `- Findings reviewed: ${findings.length}`, | ||
| `- Confirmed: ${confirmed.length}`, | ||
| `- Dismissed: ${dismissed.length}`, | ||
| `- Manual review: ${manualReview.length}`, | ||
| '' | ||
| ]; | ||
| lines.push('## Confirmed Bugs'); | ||
| if (confirmed.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const { verdict, finding } of confirmed) { | ||
| lines.push(`- ${verdict.bugId} | ${verdict.trueSeverity} | ${finding ? finding.file : 'unknown file'} | ${finding ? finding.claim : verdict.analysisSummary}`); | ||
| lines.push(` Confidence: ${verdict.confidenceScore} (${verdict.confidenceLabel}) | ${verdict.verificationMode}`); | ||
| lines.push(` Analysis: ${verdict.analysisSummary}`); | ||
| } | ||
| } | ||
| lines.push('', '## Manual Review'); | ||
| if (manualReview.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const { verdict, finding } of manualReview) { | ||
| lines.push(`- ${verdict.bugId} | ${finding ? finding.file : 'unknown file'} | ${finding ? finding.claim : verdict.analysisSummary}`); | ||
| lines.push(` Confidence: ${verdict.confidenceScore} (${verdict.confidenceLabel})`); | ||
| lines.push(` Analysis: ${verdict.analysisSummary}`); | ||
| } | ||
| } | ||
| lines.push('', '## Dismissed Findings'); | ||
| if (dismissed.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const { verdict, finding } of dismissed) { | ||
| lines.push(`- ${verdict.bugId} | ${finding ? finding.file : 'unknown file'} | ${finding ? finding.claim : 'No finding available'}`); | ||
| lines.push(` Analysis: ${verdict.analysisSummary}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function renderCoverage({ coveragePath }) { | ||
| const coverage = readJson(coveragePath); | ||
| const lines = [ | ||
| '# Bug Hunter Coverage', | ||
| '', | ||
| `- Status: ${coverage.status}`, | ||
| `- Iteration: ${coverage.iteration}`, | ||
| `- Files: ${toArray(coverage.files).length}`, | ||
| `- Bugs: ${toArray(coverage.bugs).length}`, | ||
| `- Fix entries: ${toArray(coverage.fixes).length}`, | ||
| '', | ||
| '## Files' | ||
| ]; | ||
| const files = toArray(coverage.files); | ||
| if (files.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const entry of files) { | ||
| lines.push(`- ${entry.status} | ${entry.path}`); | ||
| } | ||
| } | ||
| lines.push('', '## Bugs'); | ||
| const bugs = toArray(coverage.bugs); | ||
| if (bugs.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const bug of bugs) { | ||
| lines.push(`- ${bug.bugId} | ${bug.severity} | ${bug.file} | ${bug.claim}`); | ||
| } | ||
| } | ||
| lines.push('', '## Fixes'); | ||
| const fixes = toArray(coverage.fixes); | ||
| if (fixes.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const fix of fixes) { | ||
| lines.push(`- ${fix.bugId} | ${fix.status}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function renderSkeptic({ skepticPath }) { | ||
| const skeptic = toArray(readJson(skepticPath)); | ||
| const lines = ['# Skeptic Review', '']; | ||
| if (skeptic.length === 0) { | ||
| lines.push('- None'); | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| for (const item of skeptic) { | ||
| lines.push(`- ${item.bugId} | ${item.response}`); | ||
| lines.push(` ${item.analysisSummary}`); | ||
| if (item.counterEvidence) { | ||
| lines.push(` Evidence: ${item.counterEvidence}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function renderReferee({ refereePath }) { | ||
| const referee = toArray(readJson(refereePath)); | ||
| const lines = ['# Referee Verdicts', '']; | ||
| if (referee.length === 0) { | ||
| lines.push('- None'); | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| for (const item of referee) { | ||
| lines.push(`- ${item.bugId} | ${item.verdict} | ${item.trueSeverity}`); | ||
| lines.push(` Confidence: ${item.confidenceScore} (${item.confidenceLabel}) | ${item.verificationMode}`); | ||
| lines.push(` Analysis: ${item.analysisSummary}`); | ||
| if (item.suggestedFix) { | ||
| lines.push(` Suggested fix: ${item.suggestedFix}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function renderFixReport({ fixReportPath }) { | ||
| const report = readJson(fixReportPath); | ||
| const fixes = toArray(report.fixes); | ||
| const lines = [ | ||
| '# Fix Report', | ||
| '', | ||
| `- Branch: ${report.fix_branch}`, | ||
| `- Base commit: ${report.base_commit}`, | ||
| `- Dry run: ${report.dry_run ? 'yes' : 'no'}`, | ||
| `- Circuit breaker: ${report.circuit_breaker_tripped ? 'tripped' : 'not tripped'}`, | ||
| `- Phase 2 timeout: ${report.phase2_timeout_hit ? 'hit' : 'not hit'}`, | ||
| '', | ||
| '## Fixes' | ||
| ]; | ||
| if (fixes.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const item of fixes) { | ||
| lines.push(`- ${item.bugId} | ${item.status} | ${item.severity}`); | ||
| lines.push(` Files: ${toArray(item.files).join(', ')}`); | ||
| lines.push(` Lines: ${item.lines}`); | ||
| if (item.description) { | ||
| lines.push(` Description: ${item.description}`); | ||
| } | ||
| if (item.reason) { | ||
| lines.push(` Reason: ${item.reason}`); | ||
| } | ||
| if (item.commit) { | ||
| lines.push(` Commit: ${item.commit}`); | ||
| } | ||
| } | ||
| } | ||
| lines.push('', '## Verification'); | ||
| lines.push(`- Baseline: ${report.verification.baseline_pass} pass / ${report.verification.baseline_fail} fail`); | ||
| lines.push(`- Final: ${report.verification.final_pass} pass / ${report.verification.final_fail} fail`); | ||
| lines.push(`- New failures: ${report.verification.new_failures}`); | ||
| lines.push(`- Resolved failures: ${report.verification.resolved_failures}`); | ||
| lines.push(`- Typecheck: ${report.verification.typecheck_pass ? 'pass' : 'fail'}`); | ||
| lines.push(`- Build: ${report.verification.build_pass ? 'pass' : 'fail'}`); | ||
| lines.push(`- Fixer bugs found: ${report.verification.fixer_bugs_found}`); | ||
| lines.push('', '## Summary'); | ||
| for (const [key, value] of Object.entries(report.summary || {})) { | ||
| lines.push(`- ${key}: ${value}`); | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function renderFixStrategy({ fixStrategyPath }) { | ||
| const strategy = readJson(fixStrategyPath); | ||
| const clusters = toArray(strategy.clusters); | ||
| const lines = [ | ||
| '# Fix Strategy', | ||
| '', | ||
| `- Confidence threshold: ${strategy.confidenceThreshold}`, | ||
| `- Confirmed findings: ${strategy.summary.confirmed}`, | ||
| `- Safe autofix: ${strategy.summary.safeAutofix}`, | ||
| `- Manual review: ${strategy.summary.manualReview}`, | ||
| `- Larger refactor: ${strategy.summary.largerRefactor}`, | ||
| `- Architectural remediation: ${strategy.summary.architecturalRemediation}`, | ||
| `- Canary candidates: ${strategy.summary.canaryCandidates}`, | ||
| `- Rollout candidates: ${strategy.summary.rolloutCandidates}`, | ||
| '', | ||
| '## Clusters' | ||
| ]; | ||
| if (clusters.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const cluster of clusters) { | ||
| lines.push(`- ${cluster.clusterId} | ${cluster.strategy} | ${cluster.executionStage} | max severity ${cluster.maxSeverity}`); | ||
| lines.push(` Bugs: ${toArray(cluster.bugIds).join(', ')}`); | ||
| lines.push(` Files: ${toArray(cluster.files).join(', ')}`); | ||
| lines.push(` Summary: ${cluster.summary}`); | ||
| lines.push(` Action: ${cluster.recommendedAction}`); | ||
| lines.push(` Reasons: ${toArray(cluster.reasons).join(' | ')}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function main() { | ||
| const [command, ...args] = process.argv.slice(2); | ||
| if (!command) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| if (command === 'report') { | ||
| const [findingsPath, refereePath] = args; | ||
| if (!findingsPath || !refereePath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderReport({ | ||
| findingsPath: path.resolve(findingsPath), | ||
| refereePath: path.resolve(refereePath) | ||
| })); | ||
| return; | ||
| } | ||
| if (command === 'coverage') { | ||
| const [coveragePath] = args; | ||
| if (!coveragePath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderCoverage({ | ||
| coveragePath: path.resolve(coveragePath) | ||
| })); | ||
| return; | ||
| } | ||
| if (command === 'skeptic') { | ||
| const [skepticPath] = args; | ||
| if (!skepticPath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderSkeptic({ | ||
| skepticPath: path.resolve(skepticPath) | ||
| })); | ||
| return; | ||
| } | ||
| if (command === 'referee') { | ||
| const [refereePath] = args; | ||
| if (!refereePath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderReferee({ | ||
| refereePath: path.resolve(refereePath) | ||
| })); | ||
| return; | ||
| } | ||
| if (command === 'fix-report') { | ||
| const [fixReportPath] = args; | ||
| if (!fixReportPath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderFixReport({ | ||
| fixReportPath: path.resolve(fixReportPath) | ||
| })); | ||
| return; | ||
| } | ||
| if (command === 'fix-strategy') { | ||
| const [fixStrategyPath] = args; | ||
| if (!fixStrategyPath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| process.stdout.write(renderFixStrategy({ | ||
| fixStrategyPath: path.resolve(fixStrategyPath) | ||
| })); | ||
| return; | ||
| } | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| try { | ||
| main(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| console.error(message); | ||
| process.exit(1); | ||
| } |
| const fs = require('fs'); | ||
| const path = require('path'); | ||
| const SCHEMA_FILES = { | ||
| recon: 'recon.schema.json', | ||
| findings: 'findings.schema.json', | ||
| skeptic: 'skeptic.schema.json', | ||
| referee: 'referee.schema.json', | ||
| coverage: 'coverage.schema.json', | ||
| 'fix-report': 'fix-report.schema.json', | ||
| 'fix-plan': 'fix-plan.schema.json', | ||
| 'fix-strategy': 'fix-strategy.schema.json', | ||
| shared: 'shared.schema.json' | ||
| }; | ||
| const SCHEMA_CACHE = new Map(); | ||
| function getSchemaDir() { | ||
| return path.resolve(__dirname, '..', 'schemas'); | ||
| } | ||
| function getKnownArtifacts() { | ||
| return Object.keys(SCHEMA_FILES).filter((name) => name !== 'shared'); | ||
| } | ||
| function getSchemaPath(artifactName) { | ||
| const fileName = SCHEMA_FILES[artifactName]; | ||
| if (!fileName) { | ||
| throw new Error(`Unknown artifact schema: ${artifactName}`); | ||
| } | ||
| return path.join(getSchemaDir(), fileName); | ||
| } | ||
| function loadArtifactSchema(artifactName) { | ||
| if (!SCHEMA_FILES[artifactName]) { | ||
| throw new Error(`Unknown artifact schema: ${artifactName}`); | ||
| } | ||
| if (!SCHEMA_CACHE.has(artifactName)) { | ||
| const schemaPath = getSchemaPath(artifactName); | ||
| const schema = JSON.parse(fs.readFileSync(schemaPath, 'utf8')); | ||
| if (!Number.isInteger(schema.schemaVersion) || schema.schemaVersion <= 0) { | ||
| throw new Error(`Schema ${artifactName} is missing a valid schemaVersion`); | ||
| } | ||
| SCHEMA_CACHE.set(artifactName, { schema, schemaPath }); | ||
| } | ||
| return SCHEMA_CACHE.get(artifactName); | ||
| } | ||
| function createSchemaRef(artifactName) { | ||
| const { schema, schemaPath } = loadArtifactSchema(artifactName); | ||
| return { | ||
| artifact: artifactName, | ||
| schemaVersion: schema.schemaVersion, | ||
| schemaFile: path.relative(path.resolve(__dirname, '..'), schemaPath) | ||
| }; | ||
| } | ||
| function validateSchemaRef(reference) { | ||
| const errors = []; | ||
| if (!reference || typeof reference !== 'object' || Array.isArray(reference)) { | ||
| return { ok: false, errors: ['outputSchema must be an object'] }; | ||
| } | ||
| const artifactName = String(reference.artifact || '').trim(); | ||
| if (!artifactName) { | ||
| errors.push('outputSchema.artifact must be a non-empty string'); | ||
| } else if (!getKnownArtifacts().includes(artifactName)) { | ||
| errors.push(`outputSchema.artifact must be one of: ${getKnownArtifacts().join(', ')}`); | ||
| } | ||
| if (!Number.isInteger(reference.schemaVersion) || reference.schemaVersion <= 0) { | ||
| errors.push('outputSchema.schemaVersion must be a positive integer'); | ||
| } | ||
| if (artifactName && getKnownArtifacts().includes(artifactName)) { | ||
| const { schema, schemaPath } = loadArtifactSchema(artifactName); | ||
| const expectedRelativePath = path.relative(path.resolve(__dirname, '..'), schemaPath); | ||
| if (reference.schemaVersion !== schema.schemaVersion) { | ||
| errors.push(`outputSchema.schemaVersion must match ${artifactName} schema version ${schema.schemaVersion}`); | ||
| } | ||
| if ('schemaFile' in reference && reference.schemaFile !== expectedRelativePath) { | ||
| errors.push(`outputSchema.schemaFile must match ${expectedRelativePath}`); | ||
| } | ||
| } | ||
| return { ok: errors.length === 0, errors }; | ||
| } | ||
| function describeType(value) { | ||
| if (Array.isArray(value)) { | ||
| return 'array'; | ||
| } | ||
| if (value === null) { | ||
| return 'null'; | ||
| } | ||
| return typeof value; | ||
| } | ||
| function resolveRef(rootSchema, ref) { | ||
| if (!ref.startsWith('#/')) { | ||
| throw new Error(`Unsupported schema ref: ${ref}`); | ||
| } | ||
| const parts = ref | ||
| .slice(2) | ||
| .split('/') | ||
| .map((part) => part.replace(/~1/g, '/').replace(/~0/g, '~')); | ||
| let current = rootSchema; | ||
| for (const part of parts) { | ||
| if (!current || typeof current !== 'object' || !(part in current)) { | ||
| throw new Error(`Unable to resolve schema ref: ${ref}`); | ||
| } | ||
| current = current[part]; | ||
| } | ||
| return current; | ||
| } | ||
| function validateAgainstSchema({ value, schema, rootSchema, jsonPath, errors }) { | ||
| if (schema.$ref) { | ||
| const resolved = resolveRef(rootSchema, schema.$ref); | ||
| validateAgainstSchema({ value, schema: resolved, rootSchema, jsonPath, errors }); | ||
| return; | ||
| } | ||
| if (schema.const !== undefined && value !== schema.const) { | ||
| errors.push(`${jsonPath} must equal ${JSON.stringify(schema.const)}`); | ||
| return; | ||
| } | ||
| if (schema.enum && !schema.enum.includes(value)) { | ||
| errors.push(`${jsonPath} must be one of: ${schema.enum.join(', ')}`); | ||
| return; | ||
| } | ||
| if (schema.type === 'object') { | ||
| if (describeType(value) !== 'object') { | ||
| errors.push(`${jsonPath} must be an object`); | ||
| return; | ||
| } | ||
| const properties = schema.properties || {}; | ||
| const required = schema.required || []; | ||
| for (const propertyName of required) { | ||
| if (!(propertyName in value)) { | ||
| errors.push(`${jsonPath}.${propertyName} is required`); | ||
| } | ||
| } | ||
| for (const [propertyName, propertyValue] of Object.entries(value)) { | ||
| if (properties[propertyName]) { | ||
| validateAgainstSchema({ | ||
| value: propertyValue, | ||
| schema: properties[propertyName], | ||
| rootSchema, | ||
| jsonPath: `${jsonPath}.${propertyName}`, | ||
| errors | ||
| }); | ||
| continue; | ||
| } | ||
| if (schema.additionalProperties === false) { | ||
| errors.push(`${jsonPath}.${propertyName} is not allowed`); | ||
| } | ||
| } | ||
| return; | ||
| } | ||
| if (schema.type === 'array') { | ||
| if (!Array.isArray(value)) { | ||
| errors.push(`${jsonPath} must be an array`); | ||
| return; | ||
| } | ||
| if (Number.isInteger(schema.minItems) && value.length < schema.minItems) { | ||
| errors.push(`${jsonPath} must contain at least ${schema.minItems} item(s)`); | ||
| } | ||
| if (schema.items) { | ||
| value.forEach((item, index) => { | ||
| validateAgainstSchema({ | ||
| value: item, | ||
| schema: schema.items, | ||
| rootSchema, | ||
| jsonPath: `${jsonPath}[${index}]`, | ||
| errors | ||
| }); | ||
| }); | ||
| } | ||
| return; | ||
| } | ||
| if (schema.type === 'string') { | ||
| if (typeof value !== 'string') { | ||
| errors.push(`${jsonPath} must be a string`); | ||
| return; | ||
| } | ||
| if (Number.isInteger(schema.minLength) && value.length < schema.minLength) { | ||
| errors.push(`${jsonPath} must not be empty`); | ||
| } | ||
| if (schema.pattern) { | ||
| const matcher = new RegExp(schema.pattern); | ||
| if (!matcher.test(value)) { | ||
| errors.push(`${jsonPath} must match ${schema.pattern}`); | ||
| } | ||
| } | ||
| return; | ||
| } | ||
| if (schema.type === 'number') { | ||
| if (typeof value !== 'number' || !Number.isFinite(value)) { | ||
| errors.push(`${jsonPath} must be a number`); | ||
| return; | ||
| } | ||
| if (typeof schema.minimum === 'number' && value < schema.minimum) { | ||
| errors.push(`${jsonPath} must be >= ${schema.minimum}`); | ||
| } | ||
| if (typeof schema.maximum === 'number' && value > schema.maximum) { | ||
| errors.push(`${jsonPath} must be <= ${schema.maximum}`); | ||
| } | ||
| return; | ||
| } | ||
| if (schema.type === 'integer') { | ||
| if (!Number.isInteger(value)) { | ||
| errors.push(`${jsonPath} must be an integer`); | ||
| return; | ||
| } | ||
| if (typeof schema.minimum === 'number' && value < schema.minimum) { | ||
| errors.push(`${jsonPath} must be >= ${schema.minimum}`); | ||
| } | ||
| if (typeof schema.maximum === 'number' && value > schema.maximum) { | ||
| errors.push(`${jsonPath} must be <= ${schema.maximum}`); | ||
| } | ||
| return; | ||
| } | ||
| if (schema.type === 'boolean' && typeof value !== 'boolean') { | ||
| errors.push(`${jsonPath} must be a boolean`); | ||
| } | ||
| } | ||
| function validateArtifactValue({ artifactName, value }) { | ||
| const { schema, schemaPath } = loadArtifactSchema(artifactName); | ||
| const errors = []; | ||
| validateAgainstSchema({ | ||
| value, | ||
| schema, | ||
| rootSchema: schema, | ||
| jsonPath: '$', | ||
| errors | ||
| }); | ||
| return { | ||
| ok: errors.length === 0, | ||
| artifact: artifactName, | ||
| schemaVersion: schema.schemaVersion, | ||
| schemaFile: path.relative(path.resolve(__dirname, '..'), schemaPath), | ||
| errors | ||
| }; | ||
| } | ||
| function validateArtifactFile({ artifactName, filePath }) { | ||
| try { | ||
| const value = JSON.parse(fs.readFileSync(filePath, 'utf8')); | ||
| return validateArtifactValue({ artifactName, value }); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| return { ok: false, artifact: artifactName, errors: [message] }; | ||
| } | ||
| } | ||
| module.exports = { | ||
| createSchemaRef, | ||
| getKnownArtifacts, | ||
| getSchemaPath, | ||
| loadArtifactSchema, | ||
| validateArtifactFile, | ||
| validateArtifactValue, | ||
| validateSchemaRef | ||
| }; |
| #!/usr/bin/env node | ||
| const path = require('path'); | ||
| const { | ||
| getKnownArtifacts, | ||
| validateArtifactFile | ||
| } = require('./schema-runtime.cjs'); | ||
| function usage() { | ||
| console.error('Usage:'); | ||
| console.error(' schema-validate.cjs <artifact-name> <file-path>'); | ||
| console.error(''); | ||
| console.error(`Artifacts: ${getKnownArtifacts().join(', ')}`); | ||
| } | ||
| function main() { | ||
| const [artifactName, targetPath] = process.argv.slice(2); | ||
| if (!artifactName || !targetPath) { | ||
| usage(); | ||
| process.exit(1); | ||
| } | ||
| const result = validateArtifactFile({ | ||
| artifactName, | ||
| filePath: path.resolve(targetPath) | ||
| }); | ||
| console.log(JSON.stringify(result)); | ||
| if (!result.ok) { | ||
| process.exit(1); | ||
| } | ||
| } | ||
| try { | ||
| main(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| console.error(message); | ||
| process.exit(1); | ||
| } |
| const assert = require('node:assert/strict'); | ||
| const fs = require('fs'); | ||
| const path = require('path'); | ||
| const test = require('node:test'); | ||
| const { | ||
| makeSandbox, | ||
| resolveSkillScript, | ||
| runJson, | ||
| runRaw | ||
| } = require('./test-utils.cjs'); | ||
| function writeExecutable(filePath, content) { | ||
| fs.mkdirSync(path.dirname(filePath), { recursive: true }); | ||
| fs.writeFileSync(filePath, content, 'utf8'); | ||
| fs.chmodSync(filePath, 0o755); | ||
| } | ||
| test('pr-scope resolves the current PR via gh metadata', () => { | ||
| const sandbox = makeSandbox('pr-scope-current-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-mock.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| const args = process.argv.slice(2); | ||
| if (args[0] === 'pr' && args[1] === 'view') { | ||
| process.stdout.write(JSON.stringify({ number: 42, title: 'Fix auth flow', headRefName: 'feature/auth', baseRefName: 'main', url: 'https://example.test/pr/42' })); | ||
| process.exit(0); | ||
| } | ||
| if (args[0] === 'pr' && args[1] === 'diff') { | ||
| process.stdout.write('src/auth.ts\\nsrc/session.ts\\n'); | ||
| process.exit(0); | ||
| } | ||
| process.exit(1); | ||
| `); | ||
| const result = runJson('node', [script, 'resolve', 'current', '--repo-root', sandbox, '--gh-bin', ghPath]); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.source, 'gh'); | ||
| assert.equal(result.pr.number, 42); | ||
| assert.deepEqual(result.changedFiles, ['src/auth.ts', 'src/session.ts']); | ||
| }); | ||
| test('pr-scope falls back to git when current PR metadata is unavailable', () => { | ||
| const sandbox = makeSandbox('pr-scope-fallback-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-fail.cjs'); | ||
| const gitPath = path.join(sandbox, 'git-mock.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| process.stderr.write('gh unavailable'); | ||
| process.exit(1); | ||
| `); | ||
| writeExecutable(gitPath, `#!/usr/bin/env node | ||
| const args = process.argv.slice(2); | ||
| if (args[0] === 'rev-parse') { | ||
| process.stdout.write('feature/local\\n'); | ||
| process.exit(0); | ||
| } | ||
| if (args[0] === 'diff') { | ||
| process.stdout.write('src/local.ts\\n'); | ||
| process.exit(0); | ||
| } | ||
| process.exit(1); | ||
| `); | ||
| const result = runJson('node', [ | ||
| script, | ||
| 'resolve', | ||
| 'current', | ||
| '--repo-root', | ||
| sandbox, | ||
| '--gh-bin', | ||
| ghPath, | ||
| '--git-bin', | ||
| gitPath, | ||
| '--base', | ||
| 'develop' | ||
| ]); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.source, 'git'); | ||
| assert.equal(result.pr.headRefName, 'feature/local'); | ||
| assert.equal(result.pr.baseRefName, 'develop'); | ||
| assert.deepEqual(result.changedFiles, ['src/local.ts']); | ||
| }); | ||
| test('pr-scope resolves the most recent PR via gh list + diff', () => { | ||
| const sandbox = makeSandbox('pr-scope-recent-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-recent.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| const args = process.argv.slice(2); | ||
| if (args[0] === 'pr' && args[1] === 'list') { | ||
| process.stdout.write(JSON.stringify([{ number: 7, title: 'Recent PR', headRefName: 'feature/recent', baseRefName: 'main', url: 'https://example.test/pr/7' }])); | ||
| process.exit(0); | ||
| } | ||
| if (args[0] === 'pr' && args[1] === 'diff') { | ||
| process.stdout.write('src/recent.ts\\n'); | ||
| process.exit(0); | ||
| } | ||
| process.exit(1); | ||
| `); | ||
| const result = runJson('node', [script, 'resolve', 'recent', '--repo-root', sandbox, '--gh-bin', ghPath]); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.source, 'gh'); | ||
| assert.equal(result.pr.number, 7); | ||
| assert.deepEqual(result.changedFiles, ['src/recent.ts']); | ||
| }); | ||
| test('pr-scope uses the discovered default branch for current-branch fallback', () => { | ||
| const sandbox = makeSandbox('pr-scope-default-branch-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-fail.cjs'); | ||
| const gitPath = path.join(sandbox, 'git-default.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| process.stderr.write('gh unavailable'); | ||
| process.exit(1); | ||
| `); | ||
| writeExecutable(gitPath, `#!/usr/bin/env node | ||
| const args = process.argv.slice(2); | ||
| if (args[0] === 'rev-parse' && args[1] === '--abbrev-ref') { | ||
| process.stdout.write('feature/local\\n'); | ||
| process.exit(0); | ||
| } | ||
| if (args[0] === 'symbolic-ref') { | ||
| process.stdout.write('refs/remotes/origin/trunk\\n'); | ||
| process.exit(0); | ||
| } | ||
| if (args[0] === 'diff' && args[2] === 'origin/trunk...feature/local') { | ||
| process.stdout.write('src/from-trunk.ts\\n'); | ||
| process.exit(0); | ||
| } | ||
| process.stderr.write('unexpected command: ' + args.join(' ')); | ||
| process.exit(1); | ||
| `); | ||
| const result = runJson('node', [ | ||
| script, | ||
| 'resolve', | ||
| 'current', | ||
| '--repo-root', | ||
| sandbox, | ||
| '--gh-bin', | ||
| ghPath, | ||
| '--git-bin', | ||
| gitPath | ||
| ]); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.source, 'git'); | ||
| assert.equal(result.pr.baseRefName, 'trunk'); | ||
| assert.deepEqual(result.changedFiles, ['src/from-trunk.ts']); | ||
| }); | ||
| test('pr-scope fails current-branch fallback when no trustworthy base branch is available', () => { | ||
| const sandbox = makeSandbox('pr-scope-no-base-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-fail.cjs'); | ||
| const gitPath = path.join(sandbox, 'git-partial.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| process.stderr.write('gh unavailable'); | ||
| process.exit(1); | ||
| `); | ||
| writeExecutable(gitPath, `#!/usr/bin/env node | ||
| const args = process.argv.slice(2); | ||
| if (args[0] === 'rev-parse' && args[1] === '--abbrev-ref') { | ||
| process.stdout.write('feature/local\\n'); | ||
| process.exit(0); | ||
| } | ||
| process.stderr.write('missing default branch'); | ||
| process.exit(1); | ||
| `); | ||
| const result = runRaw('node', [ | ||
| script, | ||
| 'resolve', | ||
| 'current', | ||
| '--repo-root', | ||
| sandbox, | ||
| '--gh-bin', | ||
| ghPath, | ||
| '--git-bin', | ||
| gitPath | ||
| ], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.notEqual(result.status, 0); | ||
| assert.match(`${result.stdout || ''}${result.stderr || ''}`, /base branch|default branch|missing default branch/i); | ||
| }); | ||
| test('pr-scope fails for numbered PRs when gh metadata cannot be resolved', () => { | ||
| const sandbox = makeSandbox('pr-scope-numbered-'); | ||
| const script = resolveSkillScript('pr-scope.cjs'); | ||
| const ghPath = path.join(sandbox, 'gh-fail.cjs'); | ||
| writeExecutable(ghPath, `#!/usr/bin/env node | ||
| process.stderr.write('not found'); | ||
| process.exit(1); | ||
| `); | ||
| const result = runRaw('node', [script, 'resolve', '123', '--repo-root', sandbox, '--gh-bin', ghPath], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.notEqual(result.status, 0); | ||
| assert.match(`${result.stdout || ''}${result.stderr || ''}`, /not found/); | ||
| }); |
| const assert = require('node:assert/strict'); | ||
| const fs = require('fs'); | ||
| const path = require('path'); | ||
| const test = require('node:test'); | ||
| const { | ||
| makeSandbox, | ||
| resolveSkillScript, | ||
| runRaw, | ||
| writeJson | ||
| } = require('./test-utils.cjs'); | ||
| test('render-report renders a markdown summary from findings and referee JSON', () => { | ||
| const sandbox = makeSandbox('render-report-'); | ||
| const script = resolveSkillScript('render-report.cjs'); | ||
| const findingsPath = path.join(sandbox, 'findings.json'); | ||
| const refereePath = path.join(sandbox, 'referee.json'); | ||
| writeJson(findingsPath, [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'Critical', | ||
| category: 'security', | ||
| file: 'src/api.ts', | ||
| lines: '10-12', | ||
| claim: 'User input reaches an unsafe sink', | ||
| evidence: 'src/api.ts:10-12 ...', | ||
| runtimeTrigger: 'POST /api with attacker input', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 90 | ||
| } | ||
| ]); | ||
| writeJson(refereePath, [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| verdict: 'REAL_BUG', | ||
| trueSeverity: 'Critical', | ||
| confidenceScore: 91, | ||
| confidenceLabel: 'high', | ||
| verificationMode: 'INDEPENDENTLY_VERIFIED', | ||
| analysisSummary: 'Confirmed by tracing the sink.' | ||
| } | ||
| ]); | ||
| const result = runRaw('node', [script, 'report', findingsPath, refereePath], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.equal(result.status, 0); | ||
| assert.match(result.stdout, /# Bug Hunter Report/); | ||
| assert.match(result.stdout, /BUG-1 \| Critical \| src\/api.ts/); | ||
| assert.match(result.stdout, /Confirmed by tracing the sink/); | ||
| }); | ||
| test('render-report renders coverage markdown from coverage JSON', () => { | ||
| const sandbox = makeSandbox('render-coverage-'); | ||
| const script = resolveSkillScript('render-report.cjs'); | ||
| const coveragePath = path.join(sandbox, 'coverage.json'); | ||
| writeJson(coveragePath, { | ||
| schemaVersion: 1, | ||
| iteration: 2, | ||
| status: 'COMPLETE', | ||
| files: [{ path: 'src/a.ts', status: 'done' }], | ||
| bugs: [{ bugId: 'BUG-1', severity: 'Low', file: 'src/a.ts', claim: 'example' }], | ||
| fixes: [{ bugId: 'BUG-1', status: 'MANUAL_REVIEW' }] | ||
| }); | ||
| const result = runRaw('node', [script, 'coverage', coveragePath], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.equal(result.status, 0); | ||
| assert.match(result.stdout, /# Bug Hunter Coverage/); | ||
| assert.match(result.stdout, /done \| src\/a.ts/); | ||
| assert.match(result.stdout, /BUG-1 \| Low \| src\/a.ts \| example/); | ||
| }); | ||
| test('render-report renders a markdown summary from fix-report JSON', () => { | ||
| const sandbox = makeSandbox('render-fix-report-'); | ||
| const script = resolveSkillScript('render-report.cjs'); | ||
| const fixReportPath = path.join(sandbox, 'fix-report.json'); | ||
| writeJson(fixReportPath, { | ||
| version: '3.0.0', | ||
| fix_branch: 'bug-hunter-fix-20260311-200000', | ||
| base_commit: 'abc123', | ||
| dry_run: false, | ||
| circuit_breaker_tripped: false, | ||
| phase2_timeout_hit: false, | ||
| fixes: [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'CRITICAL', | ||
| status: 'FIXED', | ||
| files: ['src/a.ts'], | ||
| lines: '10-12', | ||
| commit: 'def456', | ||
| description: 'Parameterized the query.' | ||
| } | ||
| ], | ||
| verification: { | ||
| baseline_pass: 10, | ||
| baseline_fail: 1, | ||
| flaky_tests: 0, | ||
| final_pass: 11, | ||
| final_fail: 0, | ||
| new_failures: 0, | ||
| resolved_failures: 1, | ||
| typecheck_pass: true, | ||
| build_pass: true, | ||
| fixer_bugs_found: 0 | ||
| }, | ||
| summary: { | ||
| total_confirmed: 1, | ||
| eligible: 1, | ||
| manual_review: 0, | ||
| fixed: 1, | ||
| fix_reverted: 0, | ||
| fix_failed: 0, | ||
| skipped: 0, | ||
| fixer_bug: 0, | ||
| partial: 0 | ||
| } | ||
| }); | ||
| const result = runRaw('node', [script, 'fix-report', fixReportPath], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.equal(result.status, 0); | ||
| assert.match(result.stdout, /# Fix Report/); | ||
| assert.match(result.stdout, /BUG-1 \| FIXED \| CRITICAL/); | ||
| assert.match(result.stdout, /Parameterized the query/); | ||
| }); | ||
| test('render-report renders a markdown summary from fix-strategy JSON', () => { | ||
| const sandbox = makeSandbox('render-fix-strategy-'); | ||
| const script = resolveSkillScript('render-report.cjs'); | ||
| const fixStrategyPath = path.join(sandbox, 'fix-strategy.json'); | ||
| writeJson(fixStrategyPath, { | ||
| version: '3.1.0', | ||
| generatedAt: '2026-03-12T00:00:00.000Z', | ||
| confidenceThreshold: 75, | ||
| summary: { | ||
| confirmed: 2, | ||
| safeAutofix: 1, | ||
| manualReview: 1, | ||
| largerRefactor: 0, | ||
| architecturalRemediation: 0, | ||
| canaryCandidates: 1, | ||
| rolloutCandidates: 0 | ||
| }, | ||
| clusters: [ | ||
| { | ||
| clusterId: 'cluster-1', | ||
| strategy: 'safe-autofix', | ||
| executionStage: 'canary', | ||
| autofixEligible: true, | ||
| bugIds: ['BUG-1'], | ||
| files: ['src/a.ts'], | ||
| maxSeverity: 'CRITICAL', | ||
| summary: '1 bug(s) in src classified as safe-autofix.', | ||
| recommendedAction: 'Proceed through the guarded fix pipeline with canary verification and rollback safety.', | ||
| reasons: ['Finding is localized enough for a guarded surgical fix.'] | ||
| } | ||
| ] | ||
| }); | ||
| const result = runRaw('node', [script, 'fix-strategy', fixStrategyPath], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.equal(result.status, 0); | ||
| assert.match(result.stdout, /# Fix Strategy/); | ||
| assert.match(result.stdout, /cluster-1 \| safe-autofix \| canary/); | ||
| assert.match(result.stdout, /guarded fix pipeline/); | ||
| }); |
| const assert = require('node:assert/strict'); | ||
| const fs = require('fs'); | ||
| const test = require('node:test'); | ||
| const { resolveSkillScript } = require('./test-utils.cjs'); | ||
| test('main SKILL routes into bundled local security skills', () => { | ||
| const skillDoc = fs.readFileSync(resolveSkillScript('..', 'SKILL.md'), 'utf8'); | ||
| assert.match(skillDoc, /skills\/commit-security-scan\/SKILL\.md/); | ||
| assert.match(skillDoc, /skills\/security-review\/SKILL\.md/); | ||
| assert.match(skillDoc, /skills\/threat-model-generation\/SKILL\.md/); | ||
| assert.match(skillDoc, /skills\/vulnerability-validation\/SKILL\.md/); | ||
| assert.match(skillDoc, /--pr-security/); | ||
| assert.match(skillDoc, /--security-review/); | ||
| assert.match(skillDoc, /--validate-security/); | ||
| }); | ||
| test('README documents the integrated enterprise security pack flows', () => { | ||
| const readme = fs.readFileSync(resolveSkillScript('..', 'README.md'), 'utf8'); | ||
| assert.match(readme, /PR-focused security review routes into `commit-security-scan`/); | ||
| assert.match(readme, /`--threat-model` routes into `threat-model-generation`/); | ||
| assert.match(readme, /enterprise\/full security review routes into `security-review`/); | ||
| assert.match(readme, /`--pr-security`/); | ||
| assert.match(readme, /`--security-review`/); | ||
| assert.match(readme, /`--validate-security`/); | ||
| }); |
| const assert = require('node:assert/strict'); | ||
| const fs = require('fs'); | ||
| const path = require('path'); | ||
| const test = require('node:test'); | ||
| const { resolveSkillScript } = require('./test-utils.cjs'); | ||
| test('package.json ships the bundled local security skills', () => { | ||
| const packageJson = require(resolveSkillScript('..', 'package.json')); | ||
| assert.equal(Array.isArray(packageJson.files), true); | ||
| assert.equal(packageJson.files.includes('skills/'), true); | ||
| }); | ||
| test('bundled local security skills exist with SKILL.md entrypoints', () => { | ||
| const skillNames = [ | ||
| 'commit-security-scan', | ||
| 'security-review', | ||
| 'threat-model-generation', | ||
| 'vulnerability-validation' | ||
| ]; | ||
| for (const skillName of skillNames) { | ||
| const skillPath = resolveSkillScript('..', 'skills', skillName, 'SKILL.md'); | ||
| assert.equal(fs.existsSync(skillPath), true, `${skillName} should exist`); | ||
| const contents = fs.readFileSync(skillPath, 'utf8'); | ||
| assert.match(contents, /^---/); | ||
| assert.match(contents, /name:/); | ||
| assert.match(contents, /description:/); | ||
| } | ||
| }); |
| --- | ||
| name: commit-security-scan | ||
| description: Scan code changes for security vulnerabilities using Bug Hunter-native artifacts and STRIDE context. Use whenever the user asks for PR security review, commit-diff scanning, staged-change security checks, branch-comparison security review, or pre-merge security analysis of changed code. | ||
| --- | ||
| # Commit Security Scan | ||
| This is a bundled local Bug Hunter companion skill. It is portable and self-contained: use `.bug-hunter/*` artifacts, never `.factory/*` paths. | ||
| ## Purpose | ||
| Review *changed code* for security issues only. This skill is optimized for: | ||
| - PR review | ||
| - staged diff review | ||
| - branch diff review | ||
| - commit / commit-range security scanning | ||
| ## Inputs | ||
| Resolve the scan scope from the user request: | ||
| - PR review → use `scripts/pr-scope.cjs` | ||
| - staged review → use `git diff --cached --name-only` | ||
| - branch diff → use `git diff --name-only <base>...<head>` | ||
| - commit range → use `git diff --name-only <base>..<head>` | ||
| ## Workflow | ||
| 1. Ensure threat-model context exists. | ||
| - Preferred artifacts: | ||
| - `.bug-hunter/threat-model.md` | ||
| - `.bug-hunter/security-config.json` | ||
| - If missing, run the bundled `threat-model-generation` skill first. | ||
| 2. Resolve the changed-file scope. | ||
| 3. Read the full contents of the changed source files, not just the patch. | ||
| 4. Focus on STRIDE-oriented issues in changed code: | ||
| - Spoofing: auth/session/token mistakes | ||
| - Tampering: SQLi, XSS, path traversal, command injection, mass assignment | ||
| - Repudiation: security-sensitive actions with no auditability | ||
| - Information Disclosure: IDOR, secret exposure, verbose errors | ||
| - DoS: unbounded input, missing limits, expensive regex/queries | ||
| - Elevation of Privilege: missing authorization, role bypass, privilege escalation | ||
| 5. Reuse Bug Hunter-native security conventions: | ||
| - findings should be compatible with `.bug-hunter/findings.json` | ||
| - use STRIDE + CWE labels | ||
| - include confidence scores | ||
| 6. If the user wants only a focused security diff review, stop after the findings report. | ||
| If the user wants deeper validation, hand off to the bundled `vulnerability-validation` skill. | ||
| ## Output | ||
| Preferred outputs: | ||
| - `.bug-hunter/findings.json` when integrating with the main Bug Hunter pipeline | ||
| - `.bug-hunter/report.md` as a rendered companion if needed | ||
| ## Notes | ||
| - This skill is intentionally diff-scoped; it does not replace full-repository audits. | ||
| - Use it as the lightweight security fast-path before invoking the broader `security-review` flow. |
| # Bundled Local Security Skills | ||
| Bug Hunter ships with a local security pack under `skills/` so the repository stays portable and self-contained. | ||
| Included skills: | ||
| - `commit-security-scan` | ||
| - `security-review` | ||
| - `threat-model-generation` | ||
| - `vulnerability-validation` | ||
| ## How They Connect to Bug Hunter | ||
| These skills are part of the main Bug Hunter orchestration flow: | ||
| - PR-focused security review routes into `commit-security-scan` | ||
| - `--threat-model` routes into `threat-model-generation` | ||
| - `--security-review` routes into `security-review` | ||
| - `--validate-security` routes into `vulnerability-validation` | ||
| Bug Hunter remains the top-level orchestrator. These bundled skills provide focused security workflows and operate on Bug Hunter-native artifacts under `.bug-hunter/`. |
| --- | ||
| name: security-review | ||
| description: Run a focused STRIDE-based security review using Bug Hunter-native artifacts. Use whenever the user asks for a full security audit, repository security review, weekly security scan, PR security review with deeper validation, or wants dependency CVEs and threat-model context combined into one workflow. | ||
| --- | ||
| # Security Review | ||
| This is a bundled local Bug Hunter companion skill. It packages a security-focused review workflow without introducing any external marketplace dependency. | ||
| ## Purpose | ||
| Use this skill for deeper security audits than a simple bug hunt, especially when the user wants: | ||
| - a full security review | ||
| - PR security validation | ||
| - weekly security scanning | ||
| - dependency reachability + code review together | ||
| - threat-model-driven analysis | ||
| ## Workflow | ||
| 1. Ensure `.bug-hunter/threat-model.md` exists. | ||
| - If missing, invoke the bundled `threat-model-generation` skill. | ||
| 2. Determine the scan mode from the request: | ||
| - PR → diff-scoped review via `commit-security-scan` | ||
| - staged → staged-only security review | ||
| - weekly → recent commit range on the default branch | ||
| - full → full repository security audit | ||
| 3. If dependency scanning is relevant, run: | ||
| - `node scripts/dep-scan.cjs --target <path> --output .bug-hunter/dep-findings.json` | ||
| 4. Scan code for STRIDE threats using Bug Hunter-native conventions. | ||
| Reuse: | ||
| - `.bug-hunter/triage.json` | ||
| - `.bug-hunter/threat-model.md` | ||
| - `.bug-hunter/security-config.json` | ||
| - `.bug-hunter/dep-findings.json` | ||
| 5. Validate severe findings using the bundled `vulnerability-validation` skill. | ||
| 6. Produce structured outputs compatible with the Bug Hunter pipeline. | ||
| ## Outputs | ||
| Primary artifacts should stay inside `.bug-hunter/`: | ||
| - `.bug-hunter/findings.json` | ||
| - `.bug-hunter/referee.json` | ||
| - `.bug-hunter/report.md` | ||
| - `.bug-hunter/dep-findings.json` when dependency review is enabled | ||
| - `.bug-hunter/fix-strategy.json` if the user wants remediation planning | ||
| ## Important constraints | ||
| - Keep all paths Bug Hunter-native; do not emit `.factory/*` artifacts. | ||
| - Prefer validated, exploitability-aware findings over raw volume. | ||
| - For patching requests, hand findings back to the normal Bug Hunter fix pipeline rather than inventing a second patch system. |
| --- | ||
| name: threat-model-generation | ||
| description: Generate or refresh a STRIDE-based threat model for the current repository using Bug Hunter-native artifacts. Use whenever the repository has no threat model yet, the architecture changed materially, a security review needs fresh trust-boundary context, or the user explicitly asks for a threat model. | ||
| --- | ||
| # Threat Model Generation | ||
| This is a bundled local Bug Hunter companion skill. It generates portable threat-model artifacts under `.bug-hunter/`. | ||
| ## Purpose | ||
| Create the security context that the other security skills depend on: | ||
| - trust boundaries | ||
| - major components | ||
| - STRIDE threats | ||
| - vulnerability pattern library | ||
| - severity/config defaults | ||
| ## Required outputs | ||
| Write: | ||
| - `.bug-hunter/threat-model.md` | ||
| - `.bug-hunter/security-config.json` | ||
| ## Workflow | ||
| 1. Read `.bug-hunter/triage.json` if available for file structure and domain hints. | ||
| 2. Inspect the repository to identify: | ||
| - languages and frameworks | ||
| - public/authenticated/internal entry points | ||
| - data stores and external integrations | ||
| - sensitive assets and trust boundaries | ||
| 3. Generate a concise STRIDE threat model. | ||
| 4. Generate a matching security config with thresholds and tech-stack metadata. | ||
| ## Existing implementation hooks | ||
| Bug Hunter already has a native prompt for this capability: | ||
| - `prompts/threat-model.md` | ||
| Prefer reusing that prompt structure and artifact conventions rather than inventing a second format. | ||
| ## Output rules | ||
| - Keep the threat model short enough for downstream agents to consume. | ||
| - Be specific about trust boundaries and vulnerable code patterns. | ||
| - Keep all artifacts under `.bug-hunter/`, never `.factory/`. |
| --- | ||
| name: vulnerability-validation | ||
| description: Validate security findings for exploitability, reachability, and real-world impact using Bug Hunter-native findings artifacts. Use after security scans, before patch generation, or whenever the user wants confirmation that a suspected vulnerability is actually exploitable. | ||
| --- | ||
| # Vulnerability Validation | ||
| This is a bundled local Bug Hunter companion skill. It strengthens the security-specific parts of the Skeptic/Referee process. | ||
| ## Purpose | ||
| Take suspected or confirmed security findings and answer: | ||
| - Is the vulnerable path reachable? | ||
| - Can an attacker control the input? | ||
| - Are there existing mitigations? | ||
| - How exploitable is it really? | ||
| - What is the CVSS / PoC / impact level? | ||
| ## Inputs | ||
| Prefer Bug Hunter-native artifacts: | ||
| - `.bug-hunter/findings.json` | ||
| - `.bug-hunter/threat-model.md` | ||
| - `.bug-hunter/security-config.json` | ||
| - `.bug-hunter/dep-findings.json` when dependency issues are involved | ||
| ## Workflow | ||
| 1. Read the findings and isolate the security ones. | ||
| 2. Trace reachability: | ||
| - EXTERNAL | ||
| - AUTHENTICATED | ||
| - INTERNAL | ||
| - UNREACHABLE | ||
| 3. Trace exploitability: | ||
| - EASY | ||
| - MEDIUM | ||
| - HARD | ||
| - NOT_EXPLOITABLE | ||
| 4. Check for mitigations already present in code, framework behavior, or deployment assumptions. | ||
| 5. For confirmed HIGH/CRITICAL security bugs, generate: | ||
| - exploitation path | ||
| - benign proof of concept | ||
| - CVSS vector + score | ||
| 6. Feed the result back into Bug Hunter-native verdicting. | ||
| ## Outputs | ||
| When used as a companion to the main pipeline, keep outputs compatible with: | ||
| - `.bug-hunter/referee.json` | ||
| - `.bug-hunter/report.md` | ||
| If a separate validation artifact is helpful for the run, place it under `.bug-hunter/validated-findings.json`. | ||
| ## Important constraints | ||
| - This skill validates findings; it does not replace the normal fix pipeline. | ||
| - Keep outputs portable and self-contained under `.bug-hunter/`. | ||
| - Prefer explicit reasoning for false positives so the user can trust dismissals. |
+146
-80
| # Changelog | ||
| ## 3.0.0 — 2026-03-10 | ||
| All notable changes to this project will be documented in this file. | ||
| ### npm package, worktree-isolated Fixer, and cross-IDE installation | ||
| The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), | ||
| and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). | ||
| **npm global install and CLI:** | ||
| - New `package.json` with `@codexstar/bug-hunter` package name | ||
| - New `bin/bug-hunter` CLI entry point with `install`, `doctor`, and `info` commands | ||
| ## [3.0.5] — 2026-03-11 | ||
| ### Added | ||
| - `agents/openai.yaml` UI metadata for skill lists and quick-invoke prompts | ||
| ### Changed | ||
| - `SKILL.md` frontmatter now validates cleanly against the `skill-creator` validator | ||
| - `evals/evals.json` now matches the current `.bug-hunter/*` JSON-first pipeline, default loop/fix behavior, and modern flags like `--deps`, `--threat-model`, `--dry-run`, and `--autonomous` | ||
| - npm package files now include the `agents/` directory so `openai.yaml` ships with the published skill | ||
| ## [Unreleased] | ||
| ### Highlights | ||
| - PR review is now a first-class workflow with `--pr`, `--pr current`, `--pr recent`, `--pr 123`, `--last-pr`, and `--pr-security`. | ||
| - Bug Hunter now emits both `fix-strategy.json` and `fix-plan.json` before fix execution so remediation stays reviewable and confidence-gated. | ||
| - The enterprise security pack now ships inside the repository under `skills/`, making PR security review and full security audits portable. | ||
| - Fix execution is now safer through schema-validated planning, atomic lock handling, safer worktree cleanup, stash preservation, and shell-safe templating. | ||
| ### Added | ||
| - GitHub Actions npm publish workflow on release publish or manual dispatch, with version/tag verification before `npm publish` | ||
| - bundled local security skills under `skills/`: `commit-security-scan`, `security-review`, `threat-model-generation`, and `vulnerability-validation` | ||
| - enterprise security entrypoints: `--pr-security`, `--security-review`, and `--validate-security` | ||
| - regression tests and eval coverage for integrated local security-skill routing | ||
| - `schemas/fix-plan.schema.json` plus validation coverage for canonical fix-plan artifacts | ||
| - focused regressions for lock-token ownership, atomic lock acquisition, stale artifact clearing, shell-safe worker paths, failed-chunk fix-plan suppression, managed worktree cleanup, and stash-ref preservation | ||
| ### Changed | ||
| - portable security capabilities now live inside the repository under `skills/` instead of depending on external machine-specific skill paths | ||
| - package metadata now ships the `skills/` directory for self-contained distribution | ||
| - main Bug Hunter orchestration now routes into the bundled local security skills for PR security review, threat-model generation, enterprise security review, and vulnerability validation | ||
| - fix-lock now uses owner tokens for renew/release, atomic acquisition under contention, and safe recovery from corrupted lock files | ||
| - run-bug-hunter now shell-quotes templated command arguments, clears stale artifacts before retries, validates fix-plan artifacts, and skips fix-plan emission when chunks fail | ||
| - worktree cleanup/status now preserve unrelated directories, preserve stash metadata from defensive harvests, and avoid reporting manifest-only worktrees as dirty | ||
| - current-PR git fallback now diffs against the discovered `origin/<default-branch>` ref when the base branch comes from `origin/HEAD` | ||
| - README now opens with a short “New in This Update” and PR-first quick-start section | ||
| - `llms.txt` and `llms-full.txt` now describe the PR review flow, bundled local security pack, current fix artifacts, and the current regression-test coverage | ||
| - `skills/README.md` now explains how the bundled security skills map into Bug Hunter workflows | ||
| ## [3.0.4] — 2026-03-11 | ||
| ### Added | ||
| - `schemas/*.schema.json` versioned contracts for recon, findings, skeptic, referee, coverage, fix-report, plus shared definitions and example findings fixtures | ||
| - `scripts/schema-runtime.cjs` lightweight schema runtime and `scripts/schema-validate.cjs` CLI for local artifact checks | ||
| - `scripts/render-report.cjs` Markdown renderer for report, coverage, skeptic, referee, and fix-report views from canonical JSON artifacts | ||
| - canonical `coverage.json` output with derived `coverage.md` | ||
| - `run-bug-hunter.cjs phase` command for schema-validated Skeptic, Referee, and Fixer phase execution with retry support | ||
| - runner tests for invalid Skeptic, Referee, and Fixer artifacts plus Markdown companion rendering | ||
| ### Changed | ||
| - Hunter, Skeptic, Referee, and Fixer prompts now describe JSON-first canonical artifacts | ||
| - `payload-guard.cjs` now emits real schema refs instead of placeholder format/version objects | ||
| - `bug-hunter-state.cjs` now rejects malformed findings and stores canonical `confidenceScore`, `category`, `evidence`, `runtimeTrigger`, and `crossReferences` | ||
| - `run-bug-hunter.cjs` now treats missing or invalid `findings.json` as a retriable chunk failure, validates phase artifacts, and checks all shipped schema assets during preflight | ||
| - loop, fix-loop, local-sequential, and major mode docs now point at `*.json` phase artifacts and `coverage.json` | ||
| - README, SKILL docs, evals, and the subagent wrapper now describe rendered Markdown as a companion to canonical JSON | ||
| - preflight now checks all shipped structured-output schemas, not just findings | ||
| - structured-output migration now enforces orchestrated outbound validation beyond the local/manual path | ||
| ## [3.0.1] — 2026-03-11 | ||
| ### Changed | ||
| - Loop and fix-loop completion now require full queued source-file coverage, not just CRITICAL/HIGH coverage | ||
| - Autonomous runs now continue through remaining MEDIUM and LOW files after prioritized chunks finish unless the user interrupts | ||
| - Loop iteration guidance now scales `maxIterations` from queue size so large audits do not stop early | ||
| - Large-codebase mode now treats LOW domains as part of the default autonomous queue instead of optional skipped work | ||
| ## [3.0.0] — 2026-03-10 | ||
| ### Added | ||
| - `package.json` with `@codexstar/bug-hunter` package name | ||
| - `bin/bug-hunter` CLI entry point with `install`, `doctor`, and `info` commands | ||
| - `bug-hunter install` auto-detects Claude Code, Codex, Cursor, Kiro, and generic agents directories | ||
| - `bug-hunter doctor` checks environment readiness (Node.js, Context Hub, Context7, git) | ||
| - Install via: `npm install -g @codexstar/bug-hunter && bug-hunter install` | ||
| **Cross-IDE installation via skills.sh:** | ||
| - Compatible with `npx skills add codexstar69/bug-hunter` for Cursor, Windsurf, Copilot, Kiro, and Claude Code | ||
| - No publish step required — auto-discovered from public GitHub repo with valid SKILL.md | ||
| - `scripts/worktree-harvest.cjs` — manages git worktrees for safe, isolated Fixer execution (6 subcommands: `prepare`, `harvest`, `checkout-fix`, `cleanup`, `cleanup-all`, `status`) | ||
| - 13 new tests in `scripts/tests/worktree-harvest.test.cjs` (full suite: 25/25 passing) | ||
| - 5 new error rows in SKILL.md for worktree failures: prepare, harvest dirty, harvest no-manifest, cleanup, and checkout-fix errors | ||
| **Worktree-isolated Fixer dispatch (subagent/teams backends):** | ||
| - New `scripts/worktree-harvest.cjs` — manages git worktrees for safe, isolated Fixer execution | ||
| - 6 subcommands: `prepare`, `harvest`, `checkout-fix`, `cleanup`, `cleanup-all`, `status` | ||
| - Fixer edits happen in an isolated worktree; commits land on the fix branch without touching the user's working tree | ||
| - Crash recovery via `cleanup-all` with automatic stash preservation | ||
| - Meta-file filtering prevents `.worktree-manifest.json` and `.harvest-result.json` from polluting dirty detection | ||
| ### Changed | ||
| - `modes/fix-pipeline.md` updated with dual-path dispatch: worktree path (prepare → dispatch → harvest → cleanup) and direct path | ||
| - `modes/_dispatch.md` updated with Fixer worktree lifecycle diagram and CRITICAL warning about Agent tool's built-in `isolation: "worktree"` | ||
| - `templates/subagent-wrapper.md` updated with `{WORKTREE_RULES}` variable for Fixer isolation rules | ||
| - 13 new tests in `scripts/tests/worktree-harvest.test.cjs` (full suite: 25/25 passing) | ||
| - SKILL.md Step 5b now shows a visible `⚠️` warning when `chub` is not installed (previously a silent suggestion) | ||
| **Context Hub preflight warning:** | ||
| - SKILL.md Step 5b now shows a visible `⚠️` warning when `chub` is not installed, with install command | ||
| - Previously was a silent suggestion — now impossible to miss | ||
| ## [2.4.1] — 2026-03-10 | ||
| **SKILL.md error table:** | ||
| - 5 new error rows for worktree failures: prepare, harvest dirty, harvest no-manifest, cleanup, and checkout-fix errors | ||
| --- | ||
| ## 2026-03-10 13:26 | ||
| ### Fixed | ||
| - `scripts/triage.cjs`: LOW-only repositories promoted into `scanOrder` so script-heavy codebases do not collapse to zero scannable files | ||
@@ -44,58 +100,58 @@ - `scripts/run-bug-hunter.cjs`: `teams` backend name aligned with the documented dispatch mode | ||
| - `scripts/run-bug-hunter.cjs`: low-confidence delta expansion now reuses the caller's configured `--delta-hops` value | ||
| ### Added | ||
| - `scripts/tests/run-bug-hunter.test.cjs`: regressions for LOW-only triage, optional `code-index`, `teams` backend selection, and delta-hop expansion | ||
| ## 2.4.0 — 2026-03-10 | ||
| ## [2.4.0] — 2026-03-10 | ||
| ### Context Hub integration — curated docs with Context7 fallback | ||
| ### Added | ||
| - `scripts/doc-lookup.cjs`: hybrid documentation lookup that tries [Context Hub](https://github.com/andrewyng/context-hub) (chub) first for curated, versioned, annotatable docs, then falls back to Context7 API when chub doesn't have the library | ||
| - Requires `@aisuite/chub` installed globally (`npm install -g @aisuite/chub`) — optional but recommended; pipeline works without it via Context7 fallback | ||
| - New `scripts/doc-lookup.cjs`: hybrid documentation lookup that tries [Context Hub](https://github.com/andrewyng/context-hub) (chub) first for curated, versioned, annotatable docs, then falls back to Context7 API when chub doesn't have the library | ||
| ### Changed | ||
| - All agent prompts (hunter, skeptic, fixer, doc-lookup) updated to use `doc-lookup.cjs` as primary with `context7-api.cjs` as explicit fallback | ||
| - Preflight smoke test now checks `doc-lookup.cjs` first, falls back to `context7-api.cjs` | ||
| - `run-bug-hunter.cjs` validates both scripts exist at startup | ||
| - Requires `@aisuite/chub` installed globally (`npm install -g @aisuite/chub`) — optional but recommended; pipeline works without it via Context7 fallback | ||
| ## 2.3.0 — 2026-03-10 | ||
| ## [2.3.0] — 2026-03-10 | ||
| ### Loop mode is now on by default | ||
| ### Changed | ||
| - `LOOP_MODE=true` is the new default — every `/bug-hunter` invocation iterates until full CRITICAL/HIGH coverage | ||
| - Added `--no-loop` flag to opt out and get single-pass behavior | ||
| - `--loop` flag still accepted for backwards compatibility (no-op) | ||
| - Updated triage warnings, coverage enforcement, and all documentation to reflect the new default | ||
| - `/bug-hunter src/` now finds bugs, fixes them, AND loops until full coverage — zero flags needed | ||
| ## 2.2.1 — 2026-03-10 | ||
| ### Added | ||
| - `--no-loop` flag to opt out and get single-pass behavior | ||
| ### Fix: `--loop` mode now actually loops | ||
| ## [2.2.1] — 2026-03-10 | ||
| The `--loop` flag was broken — loop mode files described a "ralph-loop" system but never called `ralph_start`, so the pipeline ran once and stopped. Fixed: | ||
| - **`modes/loop.md`**: added explicit `ralph_start` call instructions with correct `taskContent` and `maxIterations` parameters | ||
| - **`modes/fix-loop.md`**: same fix for `--loop --fix` combined mode, plus removed manual state file creation (handled by `ralph_start`) | ||
| - **`SKILL.md`**: added CRITICAL integration note requiring `ralph_start` call when `LOOP_MODE=true` | ||
| ### Fixed | ||
| - `modes/loop.md`: added explicit `ralph_start` call instructions with correct `taskContent` and `maxIterations` parameters | ||
| - `modes/fix-loop.md`: same fix for `--loop --fix` combined mode, plus removed manual state file creation (handled by `ralph_start`) | ||
| - `SKILL.md`: added CRITICAL integration note requiring `ralph_start` call when `LOOP_MODE=true` | ||
| - Changed completion signal from `<promise>DONE</promise>` to `<promise>COMPLETE</promise>` (correct ralph-loop API) | ||
| - Each iteration now calls `ralph_done` to proceed instead of relying on a non-existent hook | ||
| ## 2.2.0 — 2026-03-10 | ||
| ## [2.2.0] — 2026-03-10 | ||
| ### Fix pipeline hardening — 12 reliability and safety optimizations | ||
| ### Added | ||
| - Rollback timeout guard: `git revert` calls now timeout after 60 seconds; conflicts abort cleanly instead of hanging | ||
| - Dynamic lock TTL: single-writer lock TTL scales with queue size (`max(1800, bugs * 600)`) | ||
| - Lock heartbeat renewal: new `renew` command in `fix-lock.cjs` | ||
| - Fixer context budget: `MAX_BUGS_PER_FIXER = 5` — large fix queues split into sequential batches | ||
| - Cross-file dependency ordering: when `code-index.cjs` is available, fixes are ordered by import graph | ||
| - Flaky test detection: baseline tests run twice; non-deterministic failures excluded from revert decisions | ||
| - Dynamic canary sizing: `max(1, min(3, ceil(eligible * 0.2)))` — canary group scales with queue size | ||
| - Dry-run mode (`--dry-run`): preview planned fixes without editing files | ||
| - Machine-readable fix report: `.bug-hunter/fix-report.json` for CI/CD gating, dashboards, and ticket automation | ||
| - Circuit breaker: if >50% of fix attempts fail/revert (min 3 attempts), remaining fixes are halted | ||
| - Global Phase 2 timeout: 30-minute deadline for the entire fix execution phase | ||
| - **Rollback timeout guard**: `git revert` calls now timeout after 60 seconds; conflicts abort cleanly instead of hanging the pipeline indefinitely | ||
| - **Dynamic lock TTL**: single-writer lock TTL scales with queue size (`max(1800, bugs * 600)`), preventing expiry on large fix runs | ||
| - **Lock heartbeat renewal**: new `renew` command in `fix-lock.cjs` — fixer renews the lock after each bug fix to prevent mid-run TTL expiry | ||
| - **Fixer context budget**: `MAX_BUGS_PER_FIXER = 5` — large fix queues are split into sequential batches to prevent context window overflow and hallucinated patches | ||
| - **Cross-file dependency ordering**: when `code-index.cjs` is available, fixes are ordered by import graph (fix dependencies before dependents) | ||
| - **Flaky test detection**: baseline tests run twice; tests that fail non-deterministically are excluded from revert decisions | ||
| - **Per-bug revert granularity**: clarified one-commit-per-bug as mandatory; reverts target individual bugs, not clusters | ||
| - **Dynamic canary sizing**: `max(1, min(3, ceil(eligible * 0.2)))` — canary group scales with queue size instead of hardcoded 1–3 | ||
| - **Post-fix re-scan severity floor**: fixer-introduced bugs below MEDIUM severity are logged but don't trigger `FIXER_BUG` status | ||
| - **Dry-run mode** (`--dry-run`): preview planned fixes without editing files — Fixer reads code and outputs unified diff previews, no git commits | ||
| - **Machine-readable fix report**: `.bug-hunter/fix-report.json` written alongside markdown report for CI/CD gating, dashboards, and ticket automation | ||
| - **Circuit breaker**: if >50% of fix attempts fail/revert (min 3 attempts), remaining fixes are halted to prevent token waste on unstable codebases | ||
| - **Global Phase 2 timeout**: 30-minute deadline for the entire fix execution phase; unprocessed bugs are marked SKIPPED | ||
| ### Changed | ||
| - Per-bug revert granularity: clarified one-commit-per-bug as mandatory; reverts target individual bugs, not clusters | ||
| - Post-fix re-scan severity floor: fixer-introduced bugs below MEDIUM severity are logged but don't trigger `FIXER_BUG` status | ||
| ## 2.1.0 — 2026-03-10 | ||
| ## [2.1.0] — 2026-03-10 | ||
| ### v3 security pipeline + dependency scanner reliability | ||
| ### Added | ||
| - STRIDE/CWE fields in Hunter findings format, with CWE quick-reference mapping for security categories | ||
@@ -108,26 +164,15 @@ - Skeptic hard-exclusion fast path (15 false-positive classes) before deep review | ||
| - Few-shot calibration examples for Hunter and Skeptic in `prompts/examples/` | ||
| ### Fixed | ||
| - `dep-scan.cjs` lockfile-aware audits (`npm`, `pnpm`, `yarn`, `bun`) and non-zero audit exit handling so vulnerability exits are not misreported as scanner failures | ||
| ## 2.0.0 — 2026-03-10 | ||
| ## [2.0.0] — 2026-03-10 | ||
| ### Structural overhaul — triage pipeline + 36% token reduction | ||
| **Pipeline restructure:** | ||
| ### Changed | ||
| - Triage moved to Step 1 (after arg parse) — was running before target resolved | ||
| - All mode files consume triage JSON — riskMap, scanOrder, fileBudget flow downstream | ||
| - Recon demoted to enrichment — no longer does file classification when triage exists | ||
| - Step 7.0 re-audit gate removed — duplicated Referee's work | ||
| **Deduplication:** | ||
| - `modes/_dispatch.md` — shared dispatch patterns (18 references across modes) | ||
| - Mode files compressed: small 7.3→2.9KB, parallel 7.9→4.2KB, extended 7.1→3.3KB, scaled 7.3→2.7KB | ||
| - Skip-file patterns consolidated — single authoritative list in SKILL.md | ||
| - Error handling table updated with correct step references | ||
| **Dead weight removed:** | ||
| - FIX-PLAN.md deleted (26KB dead planning doc) | ||
| - README.md compressed from 8.5KB to 3.7KB | ||
| - code-index.cjs marked optional | ||
| **Prompt compression:** | ||
| - hunter.md: scope rules and security checklist compressed | ||
@@ -137,13 +182,18 @@ - recon.md: output format template and "What to map" sections compressed | ||
| - skeptic.md: false-positive patterns compressed to inline format | ||
| **Logic gaps fixed:** | ||
| - Branch-diff/staged optimization note in Step 3 | ||
| - single-file.md: local-sequential backend support added | ||
| **Size:** 187,964 → 119,825 bytes (36% reduction, ~30K tokens) | ||
| ### Added | ||
| - `modes/_dispatch.md` — shared dispatch patterns (18 references across modes) | ||
| ## 1.0.0 — 2026-03-10 | ||
| ### Removed | ||
| - Step 7.0 re-audit gate removed — duplicated Referee's work | ||
| - FIX-PLAN.md deleted (26KB dead planning doc) | ||
| - README.md compressed from 8.5KB to 3.7KB | ||
| - code-index.cjs marked optional | ||
| ### Zero-token pre-recon triage (`triage.cjs`) | ||
| - `scripts/triage.cjs` runs before any LLM agent — 0 tokens, <2s for 2,000+ files | ||
| ## [1.0.0] — 2026-03-10 | ||
| ### Added | ||
| - `scripts/triage.cjs` — zero-token pre-recon triage, runs before any LLM agent (<2s for 2,000+ files) | ||
| - FILE_BUDGET, strategy, and domain map decided by triage, not Recon | ||
@@ -155,1 +205,17 @@ - Writes `.bug-hunter/triage.json` with strategy, fileBudget, domains, riskMap, scanOrder | ||
| - Large codebase strategy with domain-first tiered scanning | ||
| [Unreleased]: https://github.com/codexstar69/bug-hunter/compare/v3.0.5...HEAD | ||
| [3.0.5]: https://github.com/codexstar69/bug-hunter/compare/v3.0.4...v3.0.5 | ||
| [3.0.4]: https://github.com/codexstar69/bug-hunter/compare/v3.0.3...v3.0.4 | ||
| [3.0.3]: https://github.com/codexstar69/bug-hunter/compare/v3.0.2...v3.0.3 | ||
| [3.0.2]: https://github.com/codexstar69/bug-hunter/compare/v3.0.1...v3.0.2 | ||
| [3.0.1]: https://github.com/codexstar69/bug-hunter/compare/v3.0.0...v3.0.1 | ||
| [3.0.0]: https://github.com/codexstar69/bug-hunter/compare/v2.4.1...v3.0.0 | ||
| [2.4.1]: https://github.com/codexstar69/bug-hunter/compare/v2.4.0...v2.4.1 | ||
| [2.4.0]: https://github.com/codexstar69/bug-hunter/compare/v2.3.0...v2.4.0 | ||
| [2.3.0]: https://github.com/codexstar69/bug-hunter/compare/v2.2.1...v2.3.0 | ||
| [2.2.1]: https://github.com/codexstar69/bug-hunter/compare/v2.2.0...v2.2.1 | ||
| [2.2.0]: https://github.com/codexstar69/bug-hunter/compare/v2.1.0...v2.2.0 | ||
| [2.1.0]: https://github.com/codexstar69/bug-hunter/compare/v2.0.0...v2.1.0 | ||
| [2.0.0]: https://github.com/codexstar69/bug-hunter/compare/v1.0.0...v2.0.0 | ||
| [1.0.0]: https://github.com/codexstar69/bug-hunter/releases/tag/v1.0.0 |
+366
-102
@@ -6,4 +6,4 @@ { | ||
| "id": 1, | ||
| "prompt": "/bug-hunter test-fixture/", | ||
| "expected_output": "Full pipeline execution on the included test fixture (Express app with 6 planted bugs). Should run Recon -> Hunter -> Skeptic -> Referee and produce a final report confirming at least 5 of 6 planted bugs with severity ratings, file paths, and suggested fixes.", | ||
| "prompt": "/bug-hunter --scan-only test-fixture/", | ||
| "expected_output": "Scan-only self-test on the included Express fixture. Should run Recon -> Hunter -> Skeptic -> Referee, confirm most planted bugs, and write canonical JSON artifacts plus a rendered report.", | ||
| "files": [ | ||
@@ -17,24 +17,40 @@ "test-fixture/server.js", | ||
| { | ||
| "text": "Pipeline runs all phases: Recon, Hunter, Skeptic, Referee", | ||
| "text": "Pipeline runs Recon, Hunter, Skeptic, and Referee", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "At least 5 of 6 planted bugs are confirmed in the final report", | ||
| "text": "Writes .bug-hunter/findings.json, .bug-hunter/referee.json, and .bug-hunter/report.md", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Each confirmed bug includes file path, line numbers, severity, and suggested fix", | ||
| "text": "Confirms at least 5 of the 6 planted bugs in the fixture", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "False positives are challenged and filtered by the Skeptic/Referee pipeline", | ||
| "text": "Rendered report includes mode, files scanned, and coverage metadata", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 2, | ||
| "prompt": "/bug-hunter src/api/auth.ts", | ||
| "expected_output": "Single-file scan should skip Recon, run Hunter -> Skeptic -> Referee, and keep the output scoped to the target file while still writing canonical JSON artifacts.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Selects single-file mode when one source file is targeted", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Final report includes scan metadata (mode, files scanned, coverage)", | ||
| "text": "Skips Recon for single-file mode", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Fix pipeline is triggered by default when confirmed bugs exist; only --scan-only disables fixes", | ||
| "text": "Writes .bug-hunter/findings.json and .bug-hunter/referee.json for the single-file run", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Referee returns REAL_BUG, NOT_A_BUG, or MANUAL_REVIEW verdicts for the findings", | ||
| "type": "content_check" | ||
| } | ||
@@ -44,26 +60,42 @@ ] | ||
| { | ||
| "id": 2, | ||
| "prompt": "/bug-hunter src/api/auth.ts", | ||
| "expected_output": "Single-file mode scan of an auth file. Should skip Recon (not needed for single file), run one Hunter, one Skeptic, and one Referee. Output should focus on security and logic bugs in the auth file specifically.", | ||
| "id": 3, | ||
| "prompt": "/bug-hunter -b feature-auth --base develop", | ||
| "expected_output": "Branch diff mode should diff the branches, filter non-source files, report the resulting scan set, and choose the execution mode from the surviving source files.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Selects single-file mode (1 file detected)", | ||
| "text": "Runs git diff --name-only develop...feature-auth to resolve changed files", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Skips Recon agent (not needed for single-file mode)", | ||
| "text": "Filters docs, configs, assets, lockfiles, and other non-source files before scanning", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Hunter scans the target file and reports findings with BUG-ID format", | ||
| "text": "Reports the number of scannable source files after filtering", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Skeptic challenges the findings with code-based counter-arguments", | ||
| "text": "Chooses small, parallel, extended, scaled, or large-codebase mode from the filtered file count", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 4, | ||
| "prompt": "/bug-hunter --staged", | ||
| "expected_output": "Staged mode should scan full contents of staged source files after resolving them through git diff --cached --name-only and filtering non-source files.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs git diff --cached --name-only to collect staged files", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Referee produces a final verdict (REAL BUG or NOT A BUG) for each finding", | ||
| "text": "Filters non-source files from the staged list before scanning", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Scans full file contents of staged source files rather than scanning only the patch", | ||
| "type": "content_check" | ||
| } | ||
@@ -73,22 +105,26 @@ ] | ||
| { | ||
| "id": 3, | ||
| "prompt": "/bug-hunter -b feature-auth --base develop", | ||
| "expected_output": "Branch diff mode. Should run git diff to find changed files between feature-auth and develop branches, filter out non-source files, then run the full pipeline on the changed source files.", | ||
| "id": 5, | ||
| "prompt": "/bug-hunter --fix src/", | ||
| "expected_output": "Default fix mode should run Phase 1, then acquire the fix lock, capture verification baselines, apply eligible fixes, write a machine-readable fix report, and release the lock.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs git diff --name-only to extract changed files between branches", | ||
| "text": "Creates a git safety branch before applying fixes when git safety is available", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Filters out non-source files (configs, docs, assets, lockfiles)", | ||
| "text": "Acquires and releases .bug-hunter/fix.lock around the fix phase", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Reports the number of source files to scan after filtering", | ||
| "text": "Captures verification baseline before applying fixes", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Selects appropriate mode based on file count (small, parallel, extended, etc.)", | ||
| "text": "Writes .bug-hunter/fix-report.json as the canonical fix artifact", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Auto-fixes only bugs that pass the confidence eligibility threshold", | ||
| "type": "content_check" | ||
| } | ||
@@ -98,18 +134,22 @@ ] | ||
| { | ||
| "id": 4, | ||
| "prompt": "/bug-hunter --staged", | ||
| "expected_output": "Staged file mode for pre-commit checking. Should run git diff --cached --name-only to get staged files, filter non-source files, then scan the staged source files.", | ||
| "id": 6, | ||
| "prompt": "/bug-hunter src/", | ||
| "expected_output": "Loop mode is the default. A normal directory scan should create loop state, iterate until queued files are covered, and track canonical coverage in JSON with a rendered Markdown companion.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs git diff --cached --name-only to get staged files", | ||
| "text": "Treats loop mode as the default without requiring an explicit --loop flag", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Filters out non-source files from the staged list", | ||
| "text": "Creates or updates .bug-hunter/coverage.json as canonical loop state and renders .bug-hunter/coverage.md from it", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Scans full file contents of staged files (not just diffs)", | ||
| "text": "Tracks per-file coverage state in coverage.json across iterations", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Marks completion only when all queued scannable files are done", | ||
| "type": "content_check" | ||
| } | ||
@@ -119,41 +159,77 @@ ] | ||
| { | ||
| "id": 5, | ||
| "prompt": "/bug-hunter --fix src/", | ||
| "expected_output": "Full pipeline with auto-fix. After Phase 1 (find & verify), should proceed to Phase 2: create a git branch, acquire single-writer lock, detect test infrastructure, capture test baseline, run Fixer clusters sequentially with checkpoint commits, run post-fix tests, auto-revert regressions, and release lock.", | ||
| "id": 7, | ||
| "prompt": "Can you check my Express API for security vulnerabilities? The code is in src/", | ||
| "expected_output": "Natural-language trigger should invoke the bug-hunter skill and run a security-focused audit with trust-boundary mapping and security-oriented Hunter analysis.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Creates a git safety branch (bug-hunter-fix-*) before applying fixes", | ||
| "text": "Triggers bug-hunter from natural language security-audit intent without requiring /bug-hunter", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Detects test command from package.json or project config", | ||
| "text": "Runs Recon to identify architecture, trust boundaries, and high-risk areas", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Captures test baseline before applying fixes", | ||
| "text": "Hunter prioritizes injection, auth bypass, input validation, and secrets exposure checks", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Fixer agents implement minimal, surgical code changes", | ||
| "text": "Findings use severity labels and canonical JSON fields rather than free-form Markdown only", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 8, | ||
| "prompt": "/bug-hunter --fix --approve src/auth/", | ||
| "expected_output": "Approval mode should still run the fix pipeline, but Fixer agents should operate in reviewed mode and report that approval is required for edits.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Sets APPROVE_MODE=true from the --approve flag", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Each fix is a separate checkpoint commit with descriptive message", | ||
| "text": "Runs Fixers in reviewed/default mode instead of unattended auto-edit mode", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Post-fix test run compares against baseline (new failures vs pre-existing)", | ||
| "text": "Tells the user it is running in approval mode", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 9, | ||
| "prompt": "/bug-hunter huge-repo/", | ||
| "expected_output": "Large-repo mode should initialize persistent chunk state, process chunks sequentially, and resume from .bug-hunter/state.json when interrupted.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Initializes .bug-hunter/state.json with chunk metadata", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Fixes that cause new test failures are auto-reverted", | ||
| "text": "Processes large scans in sequential chunks and records chunk status", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Acquires and releases .claude/bug-hunter-fix.lock around fix phase", | ||
| "text": "Resumes from existing .bug-hunter/state.json without rescanning completed chunks", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 10, | ||
| "prompt": "/bug-hunter src/ (second run with unchanged files)", | ||
| "expected_output": "A repeat run should apply the hash cache through bug-hunter-state and skip unchanged files before deep scan work starts.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs hash-filter against .bug-hunter/state.json before deep scan work", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Auto-fixes only bugs that pass confidence eligibility threshold", | ||
| "text": "Reports skipped unchanged files from the hash cache", | ||
| "type": "content_check" | ||
@@ -164,26 +240,66 @@ } | ||
| { | ||
| "id": 6, | ||
| "prompt": "/bug-hunter --loop src/", | ||
| "expected_output": "Loop mode for thorough coverage. Should create ralph-loop state files, iterate the pipeline until all CRITICAL and HIGH files are scanned, track coverage in .claude/bug-hunter-coverage.md, and mark ALL_TASKS_COMPLETE when done.", | ||
| "id": 11, | ||
| "prompt": "/bug-hunter src/ with malformed subagent payload", | ||
| "expected_output": "Payload validation should fail before any subagent launch when the generated payload does not match the required contract.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Creates .claude/ralph-loop.local.md state file for loop mode", | ||
| "text": "Validates subagent payloads with payload-guard.cjs before launch", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Creates or updates .claude/bug-hunter-coverage.md with machine-parseable format", | ||
| "text": "Does not launch a subagent when payload validation fails", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 12, | ||
| "prompt": "/bug-hunter --fix src/ while another fix run is active", | ||
| "expected_output": "The fix phase should stop cleanly when the single-writer lock cannot be acquired.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Attempts to acquire .bug-hunter/fix.lock before any edits", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Tracks file coverage status (DONE, PARTIAL, SKIPPED) per iteration", | ||
| "text": "Stops Phase 2 with a clear lock-held message when the fix lock is already held", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 13, | ||
| "prompt": "/bug-hunter --fix src/ with mixed-confidence bugs", | ||
| "expected_output": "Auto-fix should edit only eligible high-confidence bugs and leave the rest in manual review.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Applies the >=75 confidence threshold for auto-fix eligibility", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Subsequent iterations only scan uncovered files (no re-scanning DONE files)", | ||
| "text": "Keeps low-confidence bugs in manual review instead of auto-editing them", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 14, | ||
| "prompt": "/bug-hunter src/ on a CLI without spawn_agent", | ||
| "expected_output": "The skill should select the best available orchestration backend at runtime and fall back to local-sequential execution when delegation backends are unavailable.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Chooses AGENT_BACKEND during preflight based on available runtime tools", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Marks ALL_TASKS_COMPLETE when all CRITICAL and HIGH files show DONE", | ||
| "text": "Falls back to the next backend when a preferred launch path fails", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Completes the run with local-sequential fallback when no delegation backend is available", | ||
| "type": "content_check" | ||
| } | ||
@@ -193,25 +309,37 @@ ] | ||
| { | ||
| "id": 7, | ||
| "prompt": "Can you check my Express API for security vulnerabilities? The code is in src/", | ||
| "expected_output": "Should trigger the bug-hunter skill (even though the user didn't say /bug-hunter) and run a security-focused scan on the src/ directory. The deep Hunter should prioritize security findings, with optional triage hints when enabled.", | ||
| "id": 15, | ||
| "prompt": "/bug-hunter huge-repo/ with flaky chunk worker", | ||
| "expected_output": "The chunk orchestrator should enforce retries with backoff and write attempt details to the canonical run journal.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Triggers bug-hunter skill from natural language (security audit request)", | ||
| "text": "Uses run-bug-hunter.cjs for autonomous chunk orchestration", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Runs Recon to map architecture and identify trust boundaries", | ||
| "text": "Retries timed out or failed chunks according to max-retries and backoff policy", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Deep Hunter focuses on injection, auth bypass, input validation, and secrets exposure in security audit requests", | ||
| "text": "Writes attempt events to .bug-hunter/run.log", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 16, | ||
| "prompt": "/bug-hunter --deps src/", | ||
| "expected_output": "Dependency scan mode should run the dependency audit helper, write dep-findings output, and feed reachable dependency issues into Hunter context.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs scripts/dep-scan.cjs when --deps is supplied", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Output includes severity ratings (Critical, Medium, Low) for each finding", | ||
| "text": "Writes .bug-hunter/dep-findings.json for dependency scan output", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Framework-specific protections are checked (Express middleware, helmet, etc.)", | ||
| "text": "Includes reachable dependency findings in Hunter analysis context", | ||
| "type": "content_check" | ||
@@ -222,22 +350,38 @@ } | ||
| { | ||
| "id": 8, | ||
| "prompt": "/bug-hunter --fix --approve src/auth/", | ||
| "expected_output": "Fix mode with approval. Should find bugs in auth directory, then fix them but prompt the user before each edit (approval mode). Fixer agents run in default mode rather than auto mode.", | ||
| "id": 17, | ||
| "prompt": "/bug-hunter --threat-model src/", | ||
| "expected_output": "Threat-model mode should load or generate a STRIDE threat model and feed it into Recon and Hunter.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "APPROVE_MODE is set to true from --approve flag", | ||
| "text": "Loads an existing .bug-hunter/threat-model.md or generates one when missing", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Fixer agents run in mode: default (user reviews each edit)", | ||
| "text": "Marks THREAT_MODEL_AVAILABLE and uses the threat model in Recon and Hunter context", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Reports 'Running in approval mode' to the user", | ||
| "text": "Keeps threat-model generation non-blocking relative to the rest of the bug-hunt flow", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 18, | ||
| "prompt": "/bug-hunter --fix --dry-run src/", | ||
| "expected_output": "Dry-run fix mode should build the fix plan and produce machine-readable fix output without editing files, committing, or taking the lock.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Sets DRY_RUN_MODE=true and forces FIX_MODE=true when --dry-run is provided", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Fixes are still committed as individual checkpoint commits", | ||
| "text": "Produces .bug-hunter/fix-report.json with dry_run set to true", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Skips file edits, git commits, and fix lock acquisition in dry-run mode", | ||
| "type": "content_check" | ||
| } | ||
@@ -247,17 +391,17 @@ ] | ||
| { | ||
| "id": 9, | ||
| "prompt": "/bug-hunter huge-repo/", | ||
| "expected_output": "Large-repo run should initialize .claude/bug-hunter-state.json, split files into sequential chunks, and resume from state if interrupted.", | ||
| "id": 19, | ||
| "prompt": "/bug-hunter --autonomous src/", | ||
| "expected_output": "Autonomous mode should force fix mode and run canary-first, confidence-gated fixes without waiting for per-edit approval.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Initializes bug-hunter-state.json with chunk metadata", | ||
| "text": "Sets AUTONOMOUS_MODE=true and forces FIX_MODE=true when --autonomous is supplied", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Processes chunks sequentially and marks each chunk state", | ||
| "text": "Runs canary-first, confidence-gated fix rollout in autonomous mode", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Can resume from existing state file without rescanning completed chunks", | ||
| "text": "Does not require approval-mode prompts for unattended autonomous fixes", | ||
| "type": "content_check" | ||
@@ -268,14 +412,18 @@ } | ||
| { | ||
| "id": 10, | ||
| "prompt": "/bug-hunter src/ (second run with unchanged files)", | ||
| "expected_output": "Hash cache should skip unchanged files and focus scan effort on changed files only.", | ||
| "id": 20, | ||
| "prompt": "/bug-hunter --pr current", | ||
| "expected_output": "PR review mode should resolve the current PR scope, save PR metadata, and scan the resolved changed files rather than the whole repository.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Runs hash-filter against bug-hunter-state.json before deep scan", | ||
| "text": "Uses scripts/pr-scope.cjs to resolve current PR metadata and changed files", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Reports skipped unchanged files from cache", | ||
| "text": "Writes .bug-hunter/pr-scope.json for later reporting", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Scans the resolved changed files as the PR review scope", | ||
| "type": "content_check" | ||
| } | ||
@@ -285,14 +433,18 @@ ] | ||
| { | ||
| "id": 11, | ||
| "prompt": "/bug-hunter src/ with malformed subagent payload", | ||
| "expected_output": "Pipeline should fail fast before spawning subagents when payload validation fails.", | ||
| "id": 21, | ||
| "prompt": "/bug-hunter --pr recent --scan-only", | ||
| "expected_output": "Recent-PR review mode should resolve the most recent PR through GitHub metadata, limit analysis to its changed files, and stop after reporting.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Validates payload via payload-guard.cjs before each subagent launch", | ||
| "text": "Resolves the most recent PR through pr-scope using GitHub metadata", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Does not launch subagent when payload validation fails", | ||
| "text": "Keeps FIX_MODE disabled because scan-only was requested", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Produces the normal findings/referee/report artifacts for the PR-scoped review", | ||
| "type": "content_check" | ||
| } | ||
@@ -302,14 +454,18 @@ ] | ||
| { | ||
| "id": 12, | ||
| "prompt": "/bug-hunter --fix src/ while another fix run is active", | ||
| "expected_output": "Fix phase should stop when single-writer lock cannot be acquired.", | ||
| "id": 22, | ||
| "prompt": "/bug-hunter --plan-only src/", | ||
| "expected_output": "Plan-only mode should build a remediation strategy and fix plan but stop before the Fixer edits code.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Attempts to acquire .claude/bug-hunter-fix.lock before any edits", | ||
| "text": "Builds .bug-hunter/fix-strategy.json and .bug-hunter/fix-strategy.md before fix execution", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Stops Phase 2 with clear lock-held message when lock is already held", | ||
| "text": "Builds .bug-hunter/fix-plan.json while PLAN_ONLY_MODE is active", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Stops before the Fixer edits files when --plan-only is supplied", | ||
| "type": "content_check" | ||
| } | ||
@@ -319,14 +475,22 @@ ] | ||
| { | ||
| "id": 13, | ||
| "prompt": "/bug-hunter --fix src/ with mixed-confidence bugs", | ||
| "expected_output": "Auto-fix should run only on high-confidence bugs and leave low-confidence bugs as manual review.", | ||
| "id": 23, | ||
| "prompt": "/bug-hunter --plan src/ then /bug-hunter --preview src/ then /bug-hunter --safe src/ then /bug-hunter --last-pr --review", | ||
| "expected_output": "Shortcut aliases should map cleanly onto their canonical behaviors without changing the underlying execution semantics.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Applies confidence threshold gating (>=75%) for auto-fix eligibility", | ||
| "text": "Treats --plan as an alias for --plan-only", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Reports low-confidence bugs as manual-review and does not auto-edit them", | ||
| "text": "Treats --preview as an alias for --fix --dry-run", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Treats --safe as an alias for --fix --approve", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Treats --last-pr and --review as aliases for --pr recent and --scan-only", | ||
| "type": "content_check" | ||
| } | ||
@@ -336,17 +500,17 @@ ] | ||
| { | ||
| "id": 14, | ||
| "prompt": "/bug-hunter src/ on a CLI without spawn_agent", | ||
| "expected_output": "Pipeline should auto-select the available orchestration backend and continue. If remote orchestration is unavailable, it should fall back to local sequential execution.", | ||
| "id": 24, | ||
| "prompt": "/bug-hunter --fix src/ with a high-confidence architectural-remediation finding", | ||
| "expected_output": "Execution gating should honor fix-strategy classifications so non-autofix findings never enter the executable canary or rollout queue.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Selects AGENT_BACKEND in preflight based on available runtime tools", | ||
| "text": "Builds fix-strategy classifications before building the executable fix plan", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Falls back to next backend when launch fails", | ||
| "text": "Excludes manual-review, larger-refactor, and architectural-remediation findings from fixPlan canary/rollout", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Completes run with local-sequential fallback when no delegation backend is available", | ||
| "text": "Allows only autofixEligible safe-autofix findings into the executable fix queue", | ||
| "type": "content_check" | ||
@@ -357,22 +521,122 @@ } | ||
| { | ||
| "id": 15, | ||
| "prompt": "/bug-hunter huge-repo/ with flaky chunk worker", | ||
| "expected_output": "Orchestrator should enforce per-chunk timeout, retry failed chunk once with backoff, and persist attempt details in run journal.", | ||
| "id": 25, | ||
| "prompt": "/bug-hunter --pr current with gh unavailable and no trustworthy default base branch", | ||
| "expected_output": "Current-PR fallback should fail explicitly when it cannot determine a trustworthy base branch instead of silently assuming main.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Uses run-bug-hunter.cjs for autonomous chunk orchestration", | ||
| "text": "Uses the discovered default branch or explicit --base for current-branch git fallback", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Retries timed out/failed chunk according to max-retries and backoff policy", | ||
| "text": "Fails explicitly when no trustworthy base branch can be determined for current PR fallback", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Writes attempt events to .claude/bug-hunter-run.log", | ||
| "text": "Does not silently assume main for current-PR fallback scope resolution", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 26, | ||
| "prompt": "/bug-hunter concurrent query-bugs and expired live fix-lock scenarios", | ||
| "expected_output": "Utility helpers should preserve correctness under failure and concurrency pressure.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "query-bugs uses invocation-scoped temp seed files and cleans them up even on failure", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "fix-lock does not recover an expired lock when the recorded owner PID is still alive", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Reports a live-owner lock conflict instead of allowing overlapping fixers", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 27, | ||
| "prompt": "/bug-hunter --pr-security", | ||
| "expected_output": "Enterprise PR security review should route through the bundled local commit-security-scan workflow, using PR scope, threat-model context, and dependency-awareness without editing code.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Treats --pr-security as PR-scoped security review with FIX_MODE disabled", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Loads the bundled local skills/commit-security-scan/SKILL.md guidance for PR-focused security review", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Combines PR scope resolution with threat-model and dependency-scan context", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 28, | ||
| "prompt": "/bug-hunter --security-review src/", | ||
| "expected_output": "Enterprise security-review mode should route through the bundled local security-review workflow and combine threat model, code review, dependency findings, and security validation semantics.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Treats --security-review as a bundled enterprise security workflow with FIX_MODE disabled", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Loads the bundled local skills/security-review/SKILL.md guidance during execution", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Runs with threat-model and dependency-scan context enabled", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 29, | ||
| "prompt": "/bug-hunter --threat-model src/ when no threat model exists yet", | ||
| "expected_output": "Threat-model mode should route through the bundled local threat-model-generation skill and produce Bug Hunter-native threat-model artifacts.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Loads the bundled local skills/threat-model-generation/SKILL.md before generating the threat model", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Writes .bug-hunter/threat-model.md and .bug-hunter/security-config.json", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Keeps all threat-model artifacts under .bug-hunter instead of external .factory paths", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| }, | ||
| { | ||
| "id": 30, | ||
| "prompt": "/bug-hunter --validate-security src/ with confirmed security findings", | ||
| "expected_output": "Security-validation mode should route through the bundled local vulnerability-validation skill and enrich confirmed security findings with exploitability-oriented reasoning.", | ||
| "files": [], | ||
| "assertions": [ | ||
| { | ||
| "text": "Loads the bundled local skills/vulnerability-validation/SKILL.md when security validation is requested", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Re-checks reachability, exploitability, PoC quality, and CVSS details for confirmed security findings", | ||
| "type": "content_check" | ||
| }, | ||
| { | ||
| "text": "Uses Bug Hunter-native artifacts rather than a separate external validation pipeline", | ||
| "type": "content_check" | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } |
@@ -38,3 +38,3 @@ # Extended Mode (FILE_BUDGET+1 to FILE_BUDGET×2 files) — chunked sequential | ||
| - **Service-aware partitioning (preferred):** If triage detected multiple domains, partition by domain. | ||
| - **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM. | ||
| - **Risk-tier partitioning (fallback):** Process CRITICAL files first, then HIGH, then MEDIUM, then LOW. | ||
| - Chunk size: FILE_BUDGET ÷ 2 files per chunk (keep chunks small to avoid compaction). | ||
@@ -71,3 +71,3 @@ - Keep same-directory files together when possible. | ||
| After all chunks complete, merge findings from state into `.bug-hunter/findings.md`. | ||
| After all chunks complete, merge findings from state into `.bug-hunter/findings.json`. | ||
@@ -74,0 +74,0 @@ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md. |
+30
-30
@@ -20,6 +20,11 @@ # Fix Loop Mode (`--loop --fix`) | ||
| ``` | ||
| MAX_FIX_LOOP_ITERATIONS = max( | ||
| 15, | ||
| min(250, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + ELIGIBLE_BUG_COUNT + 8) | ||
| ) | ||
| ralph_start({ | ||
| name: "bug-hunter-fix-audit", | ||
| taskContent: <the TODO.md content below>, | ||
| maxIterations: 15 | ||
| maxIterations: MAX_FIX_LOOP_ITERATIONS | ||
| }) | ||
@@ -31,4 +36,4 @@ ``` | ||
| - You execute one iteration of find + fix. | ||
| - You update `.bug-hunter/coverage.md` with results. | ||
| - If all bugs are FIXED and all CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>`. | ||
| - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md`. | ||
| - If all bugs are FIXED and all queued scannable source files are DONE → output `<promise>COMPLETE</promise>`. | ||
| - Otherwise → call `ralph_done` to proceed to the next iteration. | ||
@@ -40,21 +45,14 @@ | ||
| The `.bug-hunter/coverage.md` file gains additional sections: | ||
| The `.bug-hunter/coverage.json` file carries the same loop state, plus fix | ||
| entries: | ||
| ```markdown | ||
| ## Fixes | ||
| <!-- One line per bug. LATEST entry per BUG-ID is current status. --> | ||
| <!-- Format: BUG-ID|STATUS|ITERATION_FIXED|FILES_MODIFIED --> | ||
| <!-- STATUS: FIXED | FIX_REVERTED | FIX_FAILED | PARTIAL | FIX_CONFLICT | SKIPPED | FIXER_BUG --> | ||
| BUG-3|FIXED|1|src/auth/login.ts | ||
| BUG-7|FIXED|1|src/auth/login.ts | ||
| BUG-12|FIXED|2|src/api/users.ts | ||
| ## Test Results | ||
| <!-- One line per iteration. Format: ITERATION|PASSED|FAILED|NEW_FAILURES|RESOLVED --> | ||
| 1|45|3|2|0 | ||
| 2|47|1|0|1 | ||
| ```json | ||
| { | ||
| "fixes": [ | ||
| { "bugId": "BUG-3", "status": "FIXED" }, | ||
| { "bugId": "BUG-12", "status": "FIX_FAILED" } | ||
| ] | ||
| } | ||
| ``` | ||
| **Parsing rule:** For each BUG-ID, use the LAST entry in the Fixes section. Earlier entries for the same BUG-ID are history — only the latest matters. | ||
| ## Loop iteration logic | ||
@@ -64,6 +62,6 @@ | ||
| For each iteration: | ||
| 1. Read coverage file | ||
| 2. Collect (using LAST entry per BUG-ID): | ||
| - Unfixed bugs: latest STATUS in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG} | ||
| - Unscanned files: STATUS != DONE in Files section (CRITICAL/HIGH only) | ||
| 1. Read coverage.json | ||
| 2. Collect: | ||
| - Unfixed bugs: latest fix status in {FIX_REVERTED, FIX_FAILED, FIX_CONFLICT, SKIPPED, FIXER_BUG, MANUAL_REVIEW} | ||
| - Unscanned files: file status != done | ||
| 3. If unfixed bugs exist OR unscanned files exist: | ||
@@ -73,5 +71,5 @@ a. If unscanned files -> run Phase 1 (find pipeline) on them -> get new confirmed bugs | ||
| c. Run Phase 2 (fix + verify) on combined list | ||
| d. Update coverage file (append new entries to Fixes section) | ||
| d. Update coverage.json and re-render coverage.md | ||
| e. Call ralph_done to proceed to next iteration | ||
| 4. If all bugs FIXED and all CRITICAL/HIGH files DONE: | ||
| 4. If all bugs FIXED and all queued scannable source files are DONE: | ||
| -> Run final test suite one more time | ||
@@ -95,2 +93,4 @@ -> If no new failures: | ||
| - [ ] All HIGH files scanned | ||
| - [ ] All MEDIUM files scanned | ||
| - [ ] All LOW files scanned | ||
| - [ ] Findings verified through Skeptic+Referee pipeline | ||
@@ -109,9 +109,9 @@ | ||
| ## Instructions | ||
| 1. Read .bug-hunter/coverage.md for previous iteration state | ||
| 2. Parse Files table — collect unscanned CRITICAL/HIGH files | ||
| 3. Parse Fixes table — collect unfixed bugs (latest entry not FIXED) | ||
| 1. Read .bug-hunter/coverage.json for previous iteration state | ||
| 2. Parse the `files` array — collect unscanned CRITICAL/HIGH/MEDIUM/LOW files | ||
| 3. Parse the `fixes` array — collect unfixed bugs (latest entry not FIXED) | ||
| 4. If unscanned files exist: run Phase 1 (find pipeline) on them | ||
| 5. If unfixed bugs exist: run Phase 2 (fix pipeline) on them | ||
| 6. Update coverage file with results | ||
| 7. Output <promise>COMPLETE</promise> when all bugs are FIXED and no new test failures | ||
| 6. Update coverage.json with results and render coverage.md | ||
| 7. Output <promise>COMPLETE</promise> only when all queued files are DONE, all discovered bugs are FIXED, and no new test failures remain | ||
| 8. Otherwise call ralph_done to continue to the next iteration | ||
@@ -118,0 +118,0 @@ ``` |
@@ -53,7 +53,10 @@ # Phase 2: Fix Pipeline (default; also via `--fix`/`--autonomous`) | ||
| ``` | ||
| Record `LOCK_OWNER_TOKEN` from the returned JSON (`lock.ownerToken`). | ||
| If lock cannot be acquired, stop Phase 2 to avoid concurrent mutation. | ||
| **Owner token:** `acquire` returns `lock.ownerToken`; renew/release now require that token. Persist it for the entire Phase 2 run as `LOCK_OWNER_TOKEN`. | ||
| **Lock renewal:** During Step 9 execution, renew the lock after each bug fix to prevent TTL expiry on long runs: | ||
| ``` | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN" | ||
| ``` | ||
@@ -85,4 +88,13 @@ | ||
| **8d. Build sequential fix plan** | ||
| **8d. Build fix strategy + sequential fix plan** | ||
| Before deciding what to patch, write `.bug-hunter/fix-strategy.json` and `.bug-hunter/fix-strategy.md`. | ||
| The strategy artifact must classify each confirmed bug into one of: | ||
| - `safe-autofix` | ||
| - `manual-review` | ||
| - `larger-refactor` | ||
| - `architectural-remediation` | ||
| If `PLAN_ONLY_MODE=true`, stop after the strategy artifact and fix-plan preview are written. | ||
| Prepare bug queue: | ||
@@ -190,3 +202,3 @@ 1. Apply confidence gate: | ||
| ``` | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN" | ||
| ``` | ||
@@ -207,3 +219,3 @@ | ||
| ``` | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" renew ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN" | ||
| ``` | ||
@@ -267,3 +279,5 @@ | ||
| - Only report fixer-introduced bugs at MEDIUM severity or above. | ||
| - LOW-severity issues from the fixer are logged to `.bug-hunter/fix-report.md` as informational notes but do NOT trigger `FIXER_BUG` status. | ||
| - LOW-severity issues from the fixer are logged in `.bug-hunter/fix-report.json` | ||
| (and optional derived `.bug-hunter/fix-report.md`) as informational notes | ||
| but do NOT trigger `FIXER_BUG` status. | ||
@@ -309,3 +323,3 @@ This removes ambiguity from `<base-branch>` and works for path scans, staged scans, and branch scans. | ||
| ``` | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock" | ||
| node "$SKILL_DIR/scripts/fix-lock.cjs" release ".bug-hunter/fix.lock" "$LOCK_OWNER_TOKEN" | ||
| ``` | ||
@@ -384,2 +398,14 @@ If an earlier step aborts Phase 2, run the same release command AND worktree cleanup-all in best-effort cleanup before returning. | ||
| Validate it immediately: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/schema-validate.cjs" fix-report ".bug-hunter/fix-report.json" | ||
| ``` | ||
| Render the Markdown companion when humans need it: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/render-report.cjs" fix-report ".bug-hunter/fix-report.json" > ".bug-hunter/fix-report.md" | ||
| ``` | ||
| Rules: | ||
@@ -386,0 +412,0 @@ - `dry_run: true` when `DRY_RUN_MODE=true` — the `fixes` array contains planned diffs instead of commit hashes. |
+14
-15
@@ -70,3 +70,3 @@ # Large Codebase Strategy (> FILE_BUDGET×3 files) | ||
| ``` | ||
| For each domain (CRITICAL first, then HIGH, then MEDIUM): | ||
| For each domain (CRITICAL first, then HIGH, then MEDIUM, then LOW): | ||
| 1. Get this domain's file list: | ||
@@ -82,5 +82,5 @@ - If triage exists: use triage.domainFileLists[domainPath] | ||
| .bug-hunter/domains/<domain-name>/recon.md | ||
| .bug-hunter/domains/<domain-name>/findings.md | ||
| .bug-hunter/domains/<domain-name>/skeptic.md | ||
| .bug-hunter/domains/<domain-name>/referee.md | ||
| .bug-hunter/domains/<domain-name>/findings.json | ||
| .bug-hunter/domains/<domain-name>/skeptic.json | ||
| .bug-hunter/domains/<domain-name>/referee.json | ||
@@ -128,3 +128,3 @@ Record in state: | ||
| 1. Read all domain `referee.md` files and boundary results. | ||
| 1. Read all domain `referee.json` files and boundary results. | ||
| 2. Merge findings, deduplicate by file + line + claim. | ||
@@ -169,14 +169,13 @@ 3. Renumber BUG-IDs globally. | ||
| The ralph-loop's coverage check reads the state file and only marks DONE when all CRITICAL and HIGH domains show status `done`. | ||
| The ralph-loop's coverage check reads the state file and only marks DONE when all queued domains show status `done`. | ||
| ## Optimization: Skip LOW domains | ||
| ## Default autonomous behavior | ||
| For truly huge codebases (1,000+ files), skip LOW-tier domains entirely unless `--exhaustive` is specified. UI components, test utilities, and formatting helpers rarely contain runtime bugs worth the context cost. | ||
| Autonomous mode is exhaustive by default: | ||
| - Finish all CRITICAL domains first. | ||
| - Then continue through HIGH domains. | ||
| - Then continue through MEDIUM domains. | ||
| - Then continue through LOW domains. | ||
| - Only stop when the domain queue is exhausted, the user interrupts, or a hard blocker prevents safe progress. | ||
| Report skipped domains in the final report: | ||
| ``` | ||
| ℹ️ Skipped [N] LOW-tier domains ([M] files) for efficiency. | ||
| Use `--exhaustive` to include all domains. | ||
| ``` | ||
| ## Optimization: Delta-first for repeat scans | ||
@@ -216,2 +215,2 @@ | ||
| - [ ] Tier 3: Build final report with per-domain breakdown | ||
| - [ ] Coverage: All CRITICAL/HIGH domains done? If not, continue. | ||
| - [ ] Coverage: All queued domains done? If not, continue. |
@@ -9,3 +9,6 @@ # Local-Sequential Mode (no subagents — default fallback) | ||
| You (the orchestrating agent) play each role yourself, sequentially. Between phases you **write outputs to files** so later phases can reference them without holding everything in working memory. | ||
| You (the orchestrating agent) play each role yourself, sequentially. Between | ||
| phases you write canonical JSON artifacts so later phases can reference them | ||
| without holding everything in working memory. Markdown reports are derived from | ||
| those JSON files when humans need them. | ||
@@ -30,3 +33,5 @@ All state files go in `.bug-hunter/` relative to the working directory. | ||
| - If git is available, check recently changed files with `git log`. | ||
| - Write your Recon output to `.bug-hunter/recon.md` — include the tech stack, patterns, and the triage-provided risk map. | ||
| - Write your Recon output to `.bug-hunter/recon.json` if structured output is | ||
| requested; otherwise keep `.bug-hunter/recon.md` as a temporary fallback | ||
| until the Recon prompt is migrated. | ||
@@ -49,3 +54,3 @@ 3. **If `.bug-hunter/triage.json` does NOT exist** (fallback — Recon called directly): | ||
| 4. Execute the Hunter instructions yourself: | ||
| - Read files in risk-map order: CRITICAL → HIGH → MEDIUM. | ||
| - Read files in risk-map order: CRITICAL → HIGH → MEDIUM → LOW. | ||
| - For each file, use the Read tool. Do NOT rely on memory from earlier phases. | ||
@@ -55,3 +60,7 @@ - Apply the mandatory security checklist sweep (Phase 3 in hunter.md) on every CRITICAL and HIGH file. | ||
| - For each bug found, record it in the exact BUG-N format specified in hunter.md. | ||
| 5. Write your complete findings to `.bug-hunter/findings.md`. | ||
| 5. Write your complete findings to `.bug-hunter/findings.json`. | ||
| 6. Validate the artifact immediately: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/schema-validate.cjs" findings ".bug-hunter/findings.json" | ||
| ``` | ||
@@ -91,5 +100,6 @@ **Context management:** If you notice earlier files becoming hazy in your memory: | ||
| ``` | ||
| 3. After all chunks: merge findings from `.bug-hunter/state.json` into `.bug-hunter/findings.md`. | ||
| 3. After all chunks: merge findings from `.bug-hunter/state.json` into | ||
| `.bug-hunter/findings.json`. | ||
| **Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any CRITICAL or HIGH files are in FILES SKIPPED, read them now and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED. | ||
| **Gap-fill:** After scanning, compare FILES SCANNED against the risk map. If any queued scannable files are in FILES SKIPPED, read them now in priority order (CRITICAL → HIGH → MEDIUM → LOW) and append any new findings. If you truly cannot read them (context exhaustion), leave them in FILES SKIPPED so loop mode can resume them next. | ||
@@ -103,3 +113,3 @@ If TOTAL FINDINGS: 0, skip Phases C and D. Go directly to Step 7 (Final Report) in SKILL.md. | ||
| 3. **Switch mindset completely**: you are now the Skeptic. Your job is to DISPROVE false positives. Forget the pride of finding them — you want to kill weak claims. | ||
| 4. Read `.bug-hunter/findings.md` to get the findings list. | ||
| 4. Read `.bug-hunter/findings.json` to get the findings list. | ||
| 5. For EACH finding: | ||
@@ -112,3 +122,8 @@ - Re-read the actual code at the reported file and line with the Read tool. This is MANDATORY — do not evaluate from memory. | ||
| - For Critical bugs: need >67% confidence AND all cross-references read. | ||
| 6. Write your complete Skeptic output to `.bug-hunter/skeptic.md` in the format from skeptic.md. | ||
| 6. Write your complete Skeptic output to `.bug-hunter/skeptic.json` in the | ||
| format from skeptic.md. | ||
| 7. Validate it immediately: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/schema-validate.cjs" skeptic ".bug-hunter/skeptic.json" | ||
| ``` | ||
@@ -121,3 +136,3 @@ **Important:** When switching from Hunter to Skeptic, genuinely try to disprove your own findings. The point of this phase is adversarial review. If you cannot genuinely argue against a finding, ACCEPT it and move on — do not waste time rubber-stamping. | ||
| 2. **Switch mindset**: you are the impartial Referee. You trust neither the Hunter nor the Skeptic. | ||
| 3. Read both `.bug-hunter/findings.md` and `.bug-hunter/skeptic.md`. | ||
| 3. Read both `.bug-hunter/findings.json` and `.bug-hunter/skeptic.json`. | ||
| 4. For each finding: | ||
@@ -127,4 +142,12 @@ - **Tier 1 (all Critical + top 15 by severity):** Re-read the actual code yourself a THIRD time using the Read tool. Construct the runtime trigger independently. Make your own judgment. | ||
| 5. Make final REAL BUG / NOT A BUG verdicts with severity calibration. | ||
| 6. Write the final Referee report to `.bug-hunter/referee.md`. | ||
| 7. Proceed to Step 7 (Final Report) in SKILL.md. | ||
| 6. Write the final Referee verdicts to `.bug-hunter/referee.json`. | ||
| 7. Validate them immediately: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/schema-validate.cjs" referee ".bug-hunter/referee.json" | ||
| ``` | ||
| 8. Render `.bug-hunter/report.md` from the JSON artifacts: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md" | ||
| ``` | ||
| 9. Proceed to Step 7 (Final Report) in SKILL.md. | ||
@@ -137,6 +160,7 @@ ## State Files Summary | ||
| |------|-------|---------| | ||
| | `.bug-hunter/recon.md` | A | Risk map, file metrics, tech stack | | ||
| | `.bug-hunter/findings.md` | B | All Hunter findings in BUG-N format | | ||
| | `.bug-hunter/skeptic.md` | C | Skeptic challenges and decisions | | ||
| | `.bug-hunter/referee.md` | D | Final verdicts and confirmed bugs | | ||
| | `.bug-hunter/recon.json` | A | Recon artifact when structured output is used | | ||
| | `.bug-hunter/findings.json` | B | All Hunter findings in canonical JSON | | ||
| | `.bug-hunter/skeptic.json` | C | Skeptic challenges in canonical JSON | | ||
| | `.bug-hunter/referee.json` | D | Final verdicts in canonical JSON | | ||
| | `.bug-hunter/report.md` | D | Human-readable report rendered from JSON | | ||
| | `.bug-hunter/state.json` | B (chunked) | Chunk progress, findings ledger | | ||
@@ -149,6 +173,6 @@ | `.bug-hunter/source-files.json` | A | Source file list (for state init) | | ||
| - If all CRITICAL and HIGH files were scanned: proceed to Final Report. | ||
| - If any CRITICAL/HIGH files were skipped: | ||
| - If `--loop` mode: the ralph-loop will iterate and cover them next. | ||
| - If not `--loop`: include a coverage WARNING in the Final Report and recommend `--loop`. | ||
| - Do NOT claim "full coverage" or "audit complete" unless every CRITICAL and HIGH file was actually read with the Read tool and has status DONE. | ||
| - If all queued scannable source files were scanned: proceed to Final Report. | ||
| - If any queued scannable files were skipped: | ||
| - If `--loop` mode: the ralph-loop must iterate and cover the remaining queue next. | ||
| - If not `--loop`: include a coverage WARNING in the Final Report and recommend loop mode. | ||
| - Do NOT claim "full coverage" or "audit complete" unless every queued scannable source file was actually read with the Read tool and has status DONE. |
+53
-53
| # Ralph-Loop Mode (`--loop`) | ||
| When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full coverage. This is for thorough, autonomous audits where you want every file examined. | ||
| When `--loop` is present, the bug-hunter wraps itself in a ralph-loop that keeps iterating until the audit achieves full queued coverage. This is for thorough, autonomous audits where you want every queued scannable source file examined unless the user interrupts. | ||
@@ -15,6 +15,8 @@ ## CRITICAL: Starting the ralph-loop | ||
| ``` | ||
| MAX_LOOP_ITERATIONS = max(12, min(200, ceil(SCANNABLE_FILES / max(FILE_BUDGET, 1)) + 8)) | ||
| ralph_start({ | ||
| name: "bug-hunter-audit", | ||
| taskContent: <the TODO.md content below>, | ||
| maxIterations: 10 | ||
| maxIterations: MAX_LOOP_ITERATIONS | ||
| }) | ||
@@ -26,4 +28,4 @@ ``` | ||
| - You execute one iteration of the bug-hunt pipeline (steps below). | ||
| - You update `.bug-hunter/coverage.md` with results. | ||
| - If ALL CRITICAL/HIGH files are DONE → output `<promise>COMPLETE</promise>` to end the loop. | ||
| - You update `.bug-hunter/coverage.json` with results and render `.bug-hunter/coverage.md` from it. | ||
| - If ALL queued scannable source files are DONE → output `<promise>COMPLETE</promise>` to end the loop. | ||
| - Otherwise → call `ralph_done` to proceed to the next iteration. | ||
@@ -35,43 +37,39 @@ | ||
| 1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics → Referee). At the end, write a coverage report to `.bug-hunter/coverage.md` using the machine-parseable format below. | ||
| 1. **First iteration**: Run the normal pipeline (Recon → Hunters → Skeptics → | ||
| Referee). At the end, write canonical coverage state to | ||
| `.bug-hunter/coverage.json` and render `.bug-hunter/coverage.md` from it. | ||
| 2. **Coverage check**: After each iteration, evaluate: | ||
| - If ALL CRITICAL and HIGH files show status DONE → output `<promise>COMPLETE</promise>` → loop ends | ||
| - If any CRITICAL/HIGH files are SKIPPED or PARTIAL → call `ralph_done` → loop continues | ||
| - If only MEDIUM files remain uncovered → output `<promise>COMPLETE</promise>` (MEDIUM gaps are acceptable) | ||
| - If ALL queued scannable source files show status DONE → output `<promise>COMPLETE</promise>` → loop ends | ||
| - If any queued scannable source files are SKIPPED or PARTIAL → call `ralph_done` → loop continues | ||
| - Do NOT stop just because the current prioritized tier is clean; continue descending through MEDIUM and LOW files automatically | ||
| 3. **Subsequent iterations**: Each new iteration reads `.bug-hunter/coverage.md` to see what's already been done, then runs the pipeline ONLY on uncovered files. New findings are appended to the cumulative bug list. | ||
| 3. **Subsequent iterations**: Each new iteration reads | ||
| `.bug-hunter/coverage.json` to see what's already been done, then runs the | ||
| pipeline ONLY on uncovered files. New findings are appended to the | ||
| cumulative bug list. | ||
| ## Coverage file format (machine-parseable) | ||
| ## Coverage file format (canonical) | ||
| **`.bug-hunter/coverage.md`:** | ||
| ```markdown | ||
| # Bug Hunt Coverage | ||
| SCHEMA_VERSION: 2 | ||
| **`.bug-hunter/coverage.json`:** | ||
| ```json | ||
| { | ||
| "schemaVersion": 1, | ||
| "iteration": 1, | ||
| "status": "IN_PROGRESS", | ||
| "files": [ | ||
| { "path": "src/auth/login.ts", "status": "done" }, | ||
| { "path": "src/api/payments.ts", "status": "pending" } | ||
| ], | ||
| "bugs": [ | ||
| { "bugId": "BUG-3", "severity": "Critical", "file": "src/auth/login.ts", "claim": "JWT token not validated before use" } | ||
| ], | ||
| "fixes": [ | ||
| { "bugId": "BUG-3", "status": "MANUAL_REVIEW" } | ||
| ] | ||
| } | ||
| ``` | ||
| ## Meta | ||
| ITERATION: [N] | ||
| STATUS: [IN_PROGRESS | COMPLETE] | ||
| TOTAL_BUGS_FOUND: [N] | ||
| TIMESTAMP: [ISO 8601] | ||
| CHECKSUM: [line_count of Files section]|[line_count of Bugs section] | ||
| **`.bug-hunter/coverage.md`** is derived from the JSON artifact for humans. | ||
| ## Files | ||
| <!-- One line per file. Format: TIER|PATH|STATUS|ITERATION_SCANNED|BUGS_FOUND --> | ||
| <!-- STATUS: DONE | PARTIAL | SKIPPED --> | ||
| <!-- BUGS_FOUND: comma-separated BUG-IDs, or NONE --> | ||
| CRITICAL|src/auth/login.ts|DONE|1|BUG-3,BUG-7 | ||
| CRITICAL|src/auth/middleware.ts|DONE|1|NONE | ||
| HIGH|src/api/users.ts|DONE|1|BUG-12 | ||
| HIGH|src/api/payments.ts|SKIPPED|0| | ||
| MEDIUM|src/utils/format.ts|SKIPPED|0| | ||
| TEST|src/auth/login.test.ts|CONTEXT|1| | ||
| ## Bugs | ||
| <!-- One line per confirmed bug. Format: BUG-ID|SEVERITY|FILE|LINES|ONE_LINE_DESCRIPTION --> | ||
| BUG-3|Critical|src/auth/login.ts|45-52|JWT token not validated before use | ||
| BUG-7|Medium|src/auth/login.ts|89|Password comparison uses timing-unsafe equality | ||
| BUG-12|Low|src/api/users.ts|120-125|Missing null check on optional profile field | ||
| ``` | ||
| ## TODO.md task content for ralph_start | ||
@@ -88,2 +86,4 @@ | ||
| - [ ] All HIGH files scanned | ||
| - [ ] All MEDIUM files scanned | ||
| - [ ] All LOW files scanned | ||
| - [ ] Findings verified through Skeptic+Referee pipeline | ||
@@ -95,7 +95,7 @@ | ||
| ## Instructions | ||
| 1. Read .bug-hunter/coverage.md for previous iteration state | ||
| 2. Parse the Files table — collect all lines where STATUS is not DONE and TIER is CRITICAL or HIGH | ||
| 1. Read .bug-hunter/coverage.json for previous iteration state | ||
| 2. Parse the `files` array — collect all entries where `status` is not `done` | ||
| 3. Run bug-hunter pipeline on those files only | ||
| 4. Update coverage file: change STATUS to DONE, add BUG-IDs | ||
| 5. Output <promise>COMPLETE</promise> when all CRITICAL/HIGH files are DONE | ||
| 4. Update coverage JSON: change file status to `done`, append bug summaries, and render coverage.md | ||
| 5. Output <promise>COMPLETE</promise> only when all queued source files are DONE | ||
| 6. Otherwise call ralph_done to continue to the next iteration | ||
@@ -107,17 +107,16 @@ ``` | ||
| At the start of each iteration, validate the coverage file: | ||
| 1. Check `SCHEMA_VERSION: 2` exists on line 2 — if missing, this is a v1 file; migrate by adding the header | ||
| 2. Parse the CHECKSUM field: `[file_lines]|[bug_lines]` — count actual lines in Files and Bugs sections | ||
| 3. If counts don't match the checksum, the file may be corrupted. Warn: "Coverage file checksum mismatch (expected X|Y, got A|B). Re-scanning affected files." Then set any files with mismatched data to STATUS=PARTIAL for re-scan. | ||
| 4. If the file fails to parse entirely (malformed lines, missing sections), rename it to `.bug-hunter/coverage.md.bak` and start fresh. Warn user. | ||
| 1. Validate `.bug-hunter/coverage.json` against the local coverage schema. | ||
| 2. If validation fails, rename the bad file to `.bug-hunter/coverage.json.bak` | ||
| and start fresh. Warn the user. | ||
| 3. Always regenerate `.bug-hunter/coverage.md` from the JSON artifact after a | ||
| successful write. | ||
| Update the CHECKSUM every time you write to the coverage file. | ||
| ## Iteration behavior | ||
| Each iteration after the first: | ||
| 1. Read `.bug-hunter/coverage.md` — parse the Files table | ||
| 2. Collect all lines where STATUS != DONE and TIER is CRITICAL or HIGH | ||
| 1. Read `.bug-hunter/coverage.json` | ||
| 2. Collect all file entries where `status != "done"` | ||
| 3. If none remain → output `<promise>COMPLETE</promise>` (this ends the ralph-loop) | ||
| 4. Otherwise, run the pipeline on remaining files only (use small/parallel mode based on count) | ||
| 5. Update the coverage file: set STATUS to DONE for scanned files, append new bugs to the Bugs section | ||
| 5. Update `coverage.json`, then render `coverage.md` | ||
| 6. Increment ITERATION counter | ||
@@ -128,5 +127,6 @@ 7. Call `ralph_done` to proceed to the next iteration | ||
| - Max 10 iterations by default (set via `ralph_start({ maxIterations: 10 })`) | ||
| - Max iterations should scale with the queue size so autonomous runs do not stop early | ||
| - Each iteration only scans NEW files — no re-scanning already-DONE files | ||
| - User can stop anytime with ESC or `/ralph-stop` | ||
| - All state is in `.bug-hunter/coverage.md` — fully resumable, machine-parseable | ||
| - Canonical state is in `.bug-hunter/coverage.json`; `coverage.md` is derived | ||
| and fully resumable from that JSON |
@@ -73,3 +73,3 @@ # Parallel Mode (11–FILE_BUDGET files) — sequential-first hybrid | ||
| After completion, read `.bug-hunter/findings.md`. | ||
| After completion, read `.bug-hunter/findings.json`. | ||
@@ -84,3 +84,3 @@ **Merge scout + deep findings:** If scout pass ran, compare scout findings with deep Hunter findings. Promote any scout-only findings (bugs the deep Hunter missed) into the findings list for Skeptic review. | ||
| Same as small mode: compare FILES SCANNED vs risk map, re-scan any missed CRITICAL/HIGH files. | ||
| Same as small mode: compare FILES SCANNED vs risk map, then re-scan any missed queued scannable files in priority order. | ||
@@ -109,3 +109,3 @@ --- | ||
| After completion, read `.bug-hunter/referee.md`. | ||
| After completion, read `.bug-hunter/referee.json`, then render `.bug-hunter/report.md` from the JSON artifacts. | ||
@@ -112,0 +112,0 @@ --- |
+2
-2
@@ -48,3 +48,3 @@ # Scaled Mode (FILE_BUDGET×2+1 to FILE_BUDGET×3 files) — state-driven sequential | ||
| After all chunks complete: | ||
| 1. Merge findings from state into `.bug-hunter/findings.md`. | ||
| 1. Merge findings from state into `.bug-hunter/findings.json`. | ||
| 2. Run consistency check: look for duplicate BUG-IDs across chunks and conflicting claims on the same file/line. | ||
@@ -77,2 +77,2 @@ 3. Resolve conflicts: keep the finding with the stronger evidence. | ||
| If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover remaining files. | ||
| If `--loop` was specified and coverage is incomplete, the ralph-loop will iterate to cover the remaining queued files until the queue is exhausted or the user interrupts. |
@@ -16,3 +16,3 @@ # Single-File Mode (1 file) | ||
| After completion, read `.bug-hunter/findings.md`. | ||
| After completion, read `.bug-hunter/findings.json`. | ||
@@ -29,3 +29,3 @@ If TOTAL FINDINGS: 0, go to Step 7 (Final Report) in SKILL.md. | ||
| After completion, read `.bug-hunter/skeptic.md`. | ||
| After completion, read `.bug-hunter/skeptic.json`. | ||
@@ -40,2 +40,2 @@ --- | ||
| After completion, read `.bug-hunter/referee.md`. Go to Step 7 (Final Report) in SKILL.md. | ||
| After completion, read `.bug-hunter/referee.json`, render `.bug-hunter/report.md`, and go to Step 7 (Final Report) in SKILL.md. |
+11
-11
@@ -36,3 +36,3 @@ # Small Mode (2–10 files) | ||
| Pass to the Hunter: | ||
| - File list in risk-map order (CRITICAL → HIGH → MEDIUM). If triage exists, use `triage.scanOrder`. | ||
| - File list in risk-map order (CRITICAL → HIGH → MEDIUM → LOW). If triage exists, use `triage.scanOrder`. | ||
| - Risk map from Recon (or triage). | ||
@@ -42,3 +42,3 @@ - Tech stack from Recon. | ||
| After completion, read `.bug-hunter/findings.md`. | ||
| After completion, read `.bug-hunter/findings.json`. | ||
@@ -53,7 +53,7 @@ If TOTAL FINDINGS: 0, skip Skeptic and Referee. Go to Step 7 (Final Report) in SKILL.md. | ||
| If any CRITICAL or HIGH files appear in FILES SKIPPED: | ||
| If any queued scannable files appear in FILES SKIPPED: | ||
| **local-sequential:** Read the missed files yourself now and scan them for bugs. Append new findings to `.bug-hunter/findings.md`. | ||
| **local-sequential:** Read the missed files yourself now in priority order (CRITICAL → HIGH → MEDIUM → LOW) and scan them for bugs. Append new findings to `.bug-hunter/findings.json`. | ||
| **subagent/teams:** Launch a second Hunter on ONLY the missed files using the standard dispatch pattern. Merge gap findings into `.bug-hunter/findings.md`. | ||
| **subagent/teams:** Launch a second Hunter on ONLY the missed files using the standard dispatch pattern. Merge gap findings into `.bug-hunter/findings.json`. | ||
@@ -67,7 +67,7 @@ --- | ||
| Pass to the Skeptic: | ||
| - Hunter findings from `.bug-hunter/findings.md` (compact format: bugId, severity, file, lines, claim, evidence, runtimeTrigger). | ||
| - Hunter findings from `.bug-hunter/findings.json`. | ||
| - Tech stack from Recon. | ||
| - `doc-lookup.md` contents as phase-specific context. | ||
| After completion, read `.bug-hunter/skeptic.md`. | ||
| After completion, read `.bug-hunter/skeptic.json`. | ||
@@ -81,6 +81,6 @@ --- | ||
| Pass to the Referee: | ||
| - Hunter findings from `.bug-hunter/findings.md`. | ||
| - Skeptic challenges from `.bug-hunter/skeptic.md`. | ||
| - Hunter findings from `.bug-hunter/findings.json`. | ||
| - Skeptic challenges from `.bug-hunter/skeptic.json`. | ||
| After completion, read `.bug-hunter/referee.md`. | ||
| After completion, read `.bug-hunter/referee.json`. | ||
@@ -91,2 +91,2 @@ --- | ||
| Proceed to **Step 7** (Final Report) in SKILL.md. The Referee output in `.bug-hunter/referee.md` provides the confirmed bugs table, dismissed findings, and coverage stats needed for the final report. | ||
| Proceed to **Step 7** (Final Report) in SKILL.md. The Referee output in `.bug-hunter/referee.json` plus the rendered `.bug-hunter/report.md` provide the confirmed bugs table, dismissed findings, and coverage stats needed for the final report. |
+10
-1
| { | ||
| "name": "@codexstar/bug-hunter", | ||
| "version": "3.0.0", | ||
| "version": "3.0.5", | ||
| "description": "Adversarial AI bug hunter — multi-agent pipeline finds security vulnerabilities, logic errors, and runtime bugs, then fixes them autonomously. Works with Claude Code, Cursor, Codex CLI, Copilot, Kiro, and more.", | ||
| "license": "MIT", | ||
| "main": "bin/bug-hunter", | ||
| "type": "commonjs", | ||
@@ -32,8 +33,12 @@ "bin": { | ||
| "files": [ | ||
| "agents/", | ||
| "bin/", | ||
| "scripts/", | ||
| "schemas/", | ||
| "prompts/", | ||
| "templates/", | ||
| "modes/", | ||
| "skills/", | ||
| "evals/", | ||
| "docs/", | ||
| "SKILL.md", | ||
@@ -52,3 +57,7 @@ "README.md", | ||
| "homepage": "https://github.com/codexstar69/bug-hunter#readme", | ||
| "publishConfig": { | ||
| "access": "public" | ||
| }, | ||
| "scripts": { | ||
| "prepack": "node --test scripts/tests/*.test.cjs", | ||
| "test": "node --test scripts/tests/*.test.cjs", | ||
@@ -55,0 +64,0 @@ "doctor": "node bin/bug-hunter doctor", |
+35
-21
@@ -5,3 +5,6 @@ You are a surgical code fixer. You will receive a list of verified bugs from a Referee agent, each with a specific file, line range, description, and suggested fix direction. Your job is to implement the fixes — precisely, minimally, and correctly. | ||
| Write your fix report to the file path provided in your assignment (typically `.bug-hunter/fix-report.md`). If no path was provided, output to stdout. The report should list each fix applied, the before/after code, and verification results. | ||
| Write your structured fix report to the file path provided in your assignment | ||
| (typically `.bug-hunter/fix-report.json`). If no path was provided, output the | ||
| JSON to stdout. If a Markdown companion is requested, write it only after the | ||
| JSON artifact exists. | ||
@@ -11,2 +14,3 @@ ## Scope Rules | ||
| - Only fix the bugs listed in your assignment. Do NOT fix other issues you notice. | ||
| - Respect the assigned strategy. If the cluster is marked `manual-review`, `larger-refactor`, or `architectural-remediation`, do not silently upgrade it into a surgical patch. | ||
| - Do NOT refactor, add tests, or improve code style — surgical fixes only. | ||
@@ -18,2 +22,3 @@ - Each fix should change the minimum lines necessary to resolve the bug. | ||
| - **Bug list**: Confirmed bugs with BUG-IDs, file paths, line numbers, severity, description, and suggested fix direction | ||
| - **Fix strategy context**: Whether the assigned cluster is `safe-autofix`, `manual-review`, `larger-refactor`, or `architectural-remediation` | ||
| - **Tech stack context**: Framework, auth mechanism, database, key dependencies | ||
@@ -85,23 +90,32 @@ - **Directory scope**: You are assigned bugs grouped by directory — all bugs in files from the same directory subtree are yours. All bugs in the same file are guaranteed to be in your assignment. | ||
| After completing all fixes: | ||
| Write a JSON object with this shape: | ||
| --- | ||
| **FIX REPORT** | ||
| ```json | ||
| { | ||
| "generatedAt": "2026-03-11T12:00:00.000Z", | ||
| "summary": { | ||
| "bugsAssigned": 2, | ||
| "bugsFixed": 1, | ||
| "bugsNeedingLargerRefactor": 1, | ||
| "bugsSkipped": 0, | ||
| "filesModified": ["src/api/users.ts"] | ||
| }, | ||
| "fixes": [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "severity": "Critical", | ||
| "filesChanged": ["src/api/users.ts:45-52"], | ||
| "whatChanged": "Replaced string interpolation with the parameterized query helper.", | ||
| "confidenceLabel": "high", | ||
| "sideEffects": ["None"], | ||
| "notes": "Minimal patch only." | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| **Bugs fixed:** | ||
| For each bug: | ||
| **BUG-[N]** | [severity] | ||
| - **File(s) changed:** [list of files and line ranges modified] | ||
| - **What was changed:** [one-sentence description of the actual code change] | ||
| - **Confidence:** [High/Medium/Low — how confident you are this fully resolves the bug] | ||
| - **Side effects:** [None / list any potential side effects or breaking changes] | ||
| - **Notes:** [Any caveats or partial-fix details. "Requires larger refactor" if applicable.] | ||
| **Summary:** | ||
| - Bugs assigned: [N] | ||
| - Bugs fixed: [N] | ||
| - Bugs requiring larger refactor: [N] (minimal patches applied) | ||
| - Bugs skipped: [N] (with reason for each) | ||
| - Files modified: [list] | ||
| --- | ||
| Rules: | ||
| - Keep the output valid JSON. | ||
| - Use `confidenceLabel` values `high`, `medium`, or `low`. | ||
| - Keep `sideEffects` as an array, using `["None"]` when there are none. | ||
| - Do not add prose outside the JSON object. |
+37
-18
@@ -5,3 +5,7 @@ You are a code analysis agent. Your task is to thoroughly examine the provided codebase and report ALL behavioral bugs — things that will cause incorrect behavior at runtime. | ||
| Write your complete findings report to the file path provided in your assignment (typically `.bug-hunter/findings.md`). If no path was provided, output to stdout. The orchestrator reads this file to pass your findings to the Skeptic phase. | ||
| Write your canonical findings artifact as JSON to the file path provided in your | ||
| assignment (typically `.bug-hunter/findings.json`). If no path was provided, | ||
| output the JSON to stdout. If the assignment also asks for a Markdown companion, | ||
| write that separately as a derived human-readable summary; the JSON artifact is | ||
| the source of truth the Skeptic and Referee read. | ||
@@ -94,21 +98,36 @@ ## Scope Rules | ||
| For each finding, use this exact format: | ||
| Write a JSON array. Each item must match this contract: | ||
| --- | ||
| **BUG-[number]** | Severity: [Low/Medium/Critical] | Points: [1/5/10] | ||
| - **File:** [exact file path] | ||
| - **Line(s):** [line number or range] | ||
| - **Category:** [logic | security | error-handling | concurrency | edge-case | data-integrity | type-safety | resource-leak | api-contract | cross-file] | ||
| - **STRIDE:** [Spoofing | Tampering | Repudiation | InfoDisclosure | DoS | ElevationOfPrivilege | N/A] | ||
| - **CWE:** [CWE-NNN | N/A] | ||
| - **Claim:** [One-sentence statement of what is wrong — no justification, just the claim] | ||
| - **Evidence:** [Quote the EXACT code from the file, including the line number(s). Copy-paste — do not paraphrase or reconstruct from memory. The Referee will spot-check these quotes against the actual file. If the quote doesn't match, your finding is automatically dismissed.] | ||
| - **Runtime trigger:** [Describe a concrete scenario — what input, API call, or sequence of events causes this bug to manifest. Be specific: "POST /api/users with body {name: null}" not "if the input is invalid"] | ||
| - **Cross-references:** [If this bug involves multiple files, list the other files and line numbers involved. Otherwise write "Single file"] | ||
| --- | ||
| ```json | ||
| [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "severity": "Critical", | ||
| "category": "security", | ||
| "file": "src/api/users.ts", | ||
| "lines": "45-49", | ||
| "claim": "SQL is built from unsanitized user input.", | ||
| "evidence": "src/api/users.ts:45-49 const query = `...${term}...`", | ||
| "runtimeTrigger": "GET /api/users?term=' OR '1'='1", | ||
| "crossReferences": ["src/db/query.ts:10-18"], | ||
| "confidenceScore": 93, | ||
| "confidenceLabel": "high", | ||
| "stride": "Tampering", | ||
| "cwe": "CWE-89" | ||
| } | ||
| ] | ||
| ``` | ||
| **STRIDE + CWE rules:** | ||
| - `category: security` → STRIDE and CWE are REQUIRED. Choose the most specific match from the CWE Quick Reference below. | ||
| - All other categories (logic, concurrency, etc.) → STRIDE=N/A, CWE=N/A. | ||
| - If a logic bug has security implications (e.g., auth bypass via wrong comparison), reclassify as `category: security`. | ||
| Rules: | ||
| - Return a valid empty array `[]` when you found no bugs. | ||
| - `confidenceScore` must be numeric on a `0-100` scale. | ||
| - `confidenceLabel` is optional, but if present it must be `high`, `medium`, | ||
| or `low`. | ||
| - `crossReferences` must always be an array. Use `["Single file"]` when no | ||
| extra file is involved. | ||
| - `category: security` requires specific `stride` and `cwe` values. | ||
| - Non-security findings must use `stride: "N/A"` and `cwe: "N/A"`. | ||
| - Do not append coverage summaries, totals, or prose outside the JSON array. | ||
| - If the assignment also requested a Markdown companion, render it from this | ||
| JSON after writing the canonical artifact. | ||
@@ -115,0 +134,0 @@ ## CWE Quick Reference (security findings only) |
+34
-20
@@ -9,3 +9,6 @@ You are the final arbiter. You receive: (1) a bug report from Hunters, (2) challenge decisions from a Skeptic. Determine the TRUTH for each bug — accuracy matters, not agreement. | ||
| Write your complete Referee verdict report to the file path provided in your assignment (typically `.bug-hunter/referee.md`). If no path was provided, output to stdout. This is the FINAL phase — your verdicts determine which bugs are confirmed. | ||
| Write your canonical Referee verdict artifact as JSON to the file path provided | ||
| in your assignment (typically `.bug-hunter/referee.json`). If no path was | ||
| provided, output the JSON to stdout. If a Markdown report is requested, render | ||
| it from this JSON artifact after writing the canonical file. | ||
@@ -60,16 +63,34 @@ ## Scope Rules | ||
| Per bug: | ||
| Write a JSON array. Each item must match this contract: | ||
| ```json | ||
| [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "verdict": "REAL_BUG", | ||
| "trueSeverity": "Critical", | ||
| "confidenceScore": 94, | ||
| "confidenceLabel": "high", | ||
| "verificationMode": "INDEPENDENTLY_VERIFIED", | ||
| "analysisSummary": "Confirmed by tracing user-controlled input into an unsafe sink without validation.", | ||
| "suggestedFix": "Validate the input before building the query and use the parameterized helper." | ||
| } | ||
| ] | ||
| ``` | ||
| **BUG-N** | Verification: INDEPENDENTLY VERIFIED / EVIDENCE-BASED | ||
| - **Hunter's claim:** [summary] | ||
| - **Skeptic's response:** DISPROVE/ACCEPT [summary] | ||
| - **My analysis:** [what you traced and found] | ||
| - **VERDICT: REAL BUG / NOT A BUG** | Confidence: High/Medium/Low | ||
| - **True severity:** [Critical/Medium/Low] (if changed, explain) | ||
| - **Suggested fix:** [concrete: function name, check to add, line to change] | ||
| ``` | ||
| Rules: | ||
| - `verdict` must be one of `REAL_BUG`, `NOT_A_BUG`, or `MANUAL_REVIEW`. | ||
| - `confidenceScore` must be numeric on a `0-100` scale. | ||
| - `confidenceLabel` must be `high`, `medium`, or `low`. | ||
| - `verificationMode` must be `INDEPENDENTLY_VERIFIED` or `EVIDENCE_BASED`. | ||
| - Keep the reasoning in `analysisSummary`; do not emit free-form prose outside | ||
| the JSON array. | ||
| - Return `[]` only when there were no findings to referee. | ||
| ### Security enrichment (confirmed security bugs only) | ||
| For each finding with `category: security` that you confirm as REAL BUG, add these fields below the verdict: | ||
| For each finding with `category: security` that you confirm as `REAL_BUG`, | ||
| include the security enrichment details in `analysisSummary` and | ||
| `suggestedFix`. Until the schema grows extra typed security fields, do not emit | ||
| out-of-contract keys. | ||
@@ -116,10 +137,3 @@ **Reachability** (required for all security findings): | ||
| **VERIFIED BUG REPORT** | ||
| Stats: Total reported | Dismissed | Confirmed (Critical/Medium/Low) | Independently verified vs Evidence-based | Per-Hunter accuracy (if parallel) | Skeptic accuracy | ||
| Confirmed bugs table: # | Severity | STRIDE | CWE | Reachability | File | Lines | Description | Fix | Verification | ||
| Low-confidence items (flagged for manual review): file + one-line uncertainty reason. | ||
| <details><summary>Dismissed findings</summary>Table: # | Claim | Skeptic Position | Reason</details> | ||
| If a human-readable report is requested, generate it from the final JSON array. | ||
| The JSON artifact remains canonical. |
+24
-21
@@ -9,3 +9,6 @@ You are an adversarial code reviewer. Your job is to rigorously challenge each reported bug and determine if it's real or a false positive. You are the immune system — kill false positives before they waste a human's time. | ||
| Write your Skeptic challenge report to the file path in your assignment (typically `.bug-hunter/skeptic.md`). The Referee reads both Hunter findings and your challenges. | ||
| Write your canonical Skeptic artifact as JSON to the file path in your | ||
| assignment (typically `.bug-hunter/skeptic.json`). The Referee reads the JSON | ||
| artifact, not a free-form Markdown note. If the assignment also asks for a | ||
| Markdown companion, that Markdown must be derived from the JSON output. | ||
@@ -95,25 +98,25 @@ ## Scope Rules | ||
| For each bug: | ||
| Write a JSON array. Each item must match this contract: | ||
| --- | ||
| **BUG-[number]** | Original: [points] pts | ||
| - **Code reviewed:** [List the files and line ranges you actually read to evaluate this — must include all cross-referenced files] | ||
| - **Runtime trigger test:** [Did you trace the Hunter's exact scenario? What actually happens at each step?] | ||
| - **Counter-argument:** [Your specific technical argument, citing code] | ||
| - **Evidence:** [Quote the actual code or behavior that supports your position] | ||
| - **Confidence:** [0-100]% | ||
| - **Risk calc:** EV = ([confidence]% x [points]) - ([100-confidence]% x [2 x points]) = [value] | ||
| - **Decision:** DISPROVE / ACCEPT | ||
| --- | ||
| ```json | ||
| [ | ||
| { | ||
| "bugId": "BUG-1", | ||
| "response": "DISPROVE", | ||
| "analysisSummary": "The route is wrapped by auth middleware before this handler runs, so the claimed bypass is not reachable.", | ||
| "counterEvidence": "src/routes/api.ts:10-21 attaches requireAuth before the handler." | ||
| } | ||
| ] | ||
| ``` | ||
| After all bugs, output: | ||
| Rules: | ||
| - Use `response: "ACCEPT"` when the finding stands as a real bug. | ||
| - Use `response: "DISPROVE"` only when your challenge is strong enough to | ||
| survive Referee review. | ||
| - Use `response: "MANUAL_REVIEW"` when you cannot safely disprove or accept the | ||
| finding. | ||
| - Return `[]` when there were no findings to challenge. | ||
| - Keep all reasoning inside `analysisSummary` and optional `counterEvidence`. | ||
| - Do not append summary prose outside the JSON array. | ||
| **SUMMARY:** | ||
| - Bugs disproved: [count] (total points claimed: [sum]) | ||
| - Bugs accepted as real: [count] | ||
| - Files read during review: [list of files you actually read] | ||
| **ACCEPTED BUG LIST:** | ||
| [List only the BUG-IDs that you ACCEPTED, with their original severity, file path, and primary file cluster] | ||
| ## Doc Lookup Tool | ||
@@ -120,0 +123,0 @@ |
+150
-15
| <p align="center"> | ||
| <img src="docs/images/hero.png" alt="Bug Hunter — AI-powered adversarial code security scanner with multi-agent pipeline for automated vulnerability detection, false-positive elimination, and safe auto-fix" width="720"> | ||
| <img src="docs/images/2026-03-12-hero-bug-hunter-overview.png" alt="Bug Hunter product overview banner — code and pull requests flow through adversarial review, strategic fix planning, and verified patch delivery" width="720"> | ||
| </p> | ||
@@ -9,2 +9,4 @@ | ||
| <a href="#install">Install</a> · | ||
| <a href="#new-in-this-update">New in This Update</a> · | ||
| <a href="#start-here">Start Here</a> · | ||
| <a href="#usage">Usage</a> · | ||
@@ -51,2 +53,35 @@ <a href="#how-the-adversarial-pipeline-works">How It Works</a> · | ||
| ## New in This Update | ||
| This release makes Bug Hunter much better at PR-first auditing and safer at automated remediation. | ||
| - **PR review is now a first-class workflow.** Review the current PR, the most recent PR, or a specific PR number with `--pr`, `--pr current`, `--pr recent`, or `--pr 123`. | ||
| - **PR security review is now built in.** `--pr-security` runs a PR-scoped security audit with threat-model and dependency context, without editing code. | ||
| - **Strategic remediation is now explicit.** Bug Hunter writes `fix-strategy.json` and `fix-plan.json` before fixes run, so auto-fix decisions stay explainable and reviewable. | ||
| - **The security pack is now bundled locally.** `commit-security-scan`, `security-review`, `threat-model-generation`, and `vulnerability-validation` now ship inside the repo under `skills/`. | ||
| - **Fix execution is harder to break.** This update adds schema-validated fix plans, atomic lock handling, safer worktree cleanup, stash preservation, and shell-safe worker command templating. | ||
| <p align="center"> | ||
| <img src="docs/images/2026-03-12-pr-review-flow.png" alt="PR review workflow banner — pull request scope, security checks, threat-model context, and final verdict in a clean product-style UI" width="100%"> | ||
| </p> | ||
| ## Start Here | ||
| If you're evaluating the new PR flow, start with one of these: | ||
| ```bash | ||
| /bug-hunter --pr # review the current PR end to end | ||
| /bug-hunter --pr-security # PR-focused security review without editing code | ||
| /bug-hunter --last-pr --review # review the most recent PR without fixes | ||
| /bug-hunter --plan src/ # build fix-strategy.json + fix-plan.json only | ||
| ``` | ||
| If you just want the default repo audit: | ||
| ```bash | ||
| /bug-hunter | ||
| ``` | ||
| --- | ||
| ## Usage | ||
@@ -59,5 +94,18 @@ | ||
| /bug-hunter --scan-only src/ # report only — no code changes | ||
| /bug-hunter --review src/ # easy alias for --scan-only | ||
| /bug-hunter --fix --approve src/ # ask before each fix | ||
| /bug-hunter --safe src/ # easy alias for --fix --approve | ||
| /bug-hunter -b feature-xyz # scan only files changed in branch (vs main) | ||
| /bug-hunter --pr # easy alias for --pr current | ||
| /bug-hunter --pr current # review the current PR end to end | ||
| /bug-hunter --pr recent # review the most recently updated open PR | ||
| /bug-hunter --pr 123 # review a specific PR number | ||
| /bug-hunter --pr-security # PR security review with threat model + CVE context | ||
| /bug-hunter --review-pr # easy alias for --pr current | ||
| /bug-hunter --last-pr --review # review the most recent PR without editing | ||
| /bug-hunter --staged # scan staged files (pre-commit hook) | ||
| /bug-hunter --plan src/ # easy alias for --plan-only | ||
| /bug-hunter --preview src/ # easy alias for --fix --dry-run | ||
| /bug-hunter --security-review src/ # enterprise security workflow for a path or repo | ||
| /bug-hunter --validate-security src/ # force exploitability validation for security findings | ||
| /bug-hunter --deps --threat-model # full audit: CVEs + STRIDE threat model | ||
@@ -78,2 +126,4 @@ ``` | ||
| - [New in This Update](#new-in-this-update) | ||
| - [Start Here](#start-here) | ||
| - [How the Adversarial Pipeline Works](#how-the-adversarial-pipeline-works) | ||
@@ -169,2 +219,24 @@ - [Features](#features) | ||
| ### Bundled Local Security Skills | ||
| Bug Hunter now ships with a portable local security pack under `skills/`: | ||
| - `commit-security-scan` | ||
| - `security-review` | ||
| - `threat-model-generation` | ||
| - `vulnerability-validation` | ||
| These are bundled inside the repository so the system does not depend on external marketplace paths or machine-specific skill installs. They are adapted to Bug Hunter-native artifacts like `.bug-hunter/threat-model.md`, `.bug-hunter/security-config.json`, `.bug-hunter/findings.json`, and `.bug-hunter/referee.json`. | ||
| They are now wired into the main Bug Hunter flow: | ||
| - PR-focused security review routes into `commit-security-scan` | ||
| - `--threat-model` routes into `threat-model-generation` | ||
| - enterprise/full security review routes into `security-review` | ||
| - exploitability confirmation for security findings routes into `vulnerability-validation` | ||
| Bug Hunter remains the top-level orchestrator; the bundled skills are capability modules inside that orchestration. | ||
| <p align="center"> | ||
| <img src="docs/images/2026-03-12-security-pack.png" alt="Bundled local security pack banner — Bug Hunter orchestrates commit security scan, security review, threat-model generation, and vulnerability validation" width="100%"> | ||
| </p> | ||
| ### Zero-Token Triage — Instant File Classification | ||
@@ -288,3 +360,3 @@ | ||
| Loop mode is **on by default** — the pipeline runs iteratively until every critical and high-risk file has been audited, with persistent state enabling stop-and-resume workflows. Use `--no-loop` for a single-pass scan. | ||
| Loop mode is **on by default** — the pipeline runs iteratively until every queued scannable source file has been audited and, in fix mode, every discovered fixable bug has been processed. The agent should keep descending through CRITICAL → HIGH → MEDIUM → LOW automatically unless the user interrupts. Use `--no-loop` for a single-pass scan. | ||
@@ -394,2 +466,6 @@ --- | ||
| <p align="center"> | ||
| <img src="docs/images/2026-03-12-fix-plan-rollout.png" alt="Strategic fix planning banner — strategy, confidence gating, canary rollout, verification, and rollback safety" width="100%"> | ||
| </p> | ||
| ### Phase 1 — Safety Setup and Git Branching | ||
@@ -410,4 +486,16 @@ | ||
| ### Phase 3 — Confidence-Gated Fix Queue | ||
| ### Phase 3 — Strategy Before Patching | ||
| Before the Fixer edits anything, Bug Hunter now writes a canonical `fix-strategy.json` artifact. | ||
| It clusters confirmed bugs and classifies them into one of four tracks: | ||
| - **safe-autofix** — localized enough for guarded patching | ||
| - **manual-review** — confidence too low for unattended edits | ||
| - **larger-refactor** — needs coordinated multi-file changes | ||
| - **architectural-remediation** — broad contract or design issue; report, don’t auto-edit | ||
| This makes the remediation plan visible before execution. Users who want review without mutation can run `--plan-only` to stop after strategy + plan generation. | ||
| ### Phase 4 — Confidence-Gated Fix Queue | ||
| - **75% confidence gate**: only bugs the Referee confirmed with ≥75% confidence are auto-fixed | ||
@@ -418,3 +506,3 @@ - Bugs below the threshold are marked `MANUAL_REVIEW` — reported but never auto-edited | ||
| ### Phase 4 — Canary Rollout Strategy | ||
| ### Phase 5 — Canary Rollout Strategy | ||
@@ -482,2 +570,6 @@ ``` | ||
| <p align="center"> | ||
| <img src="docs/images/2026-03-12-machine-readable-artifacts.png" alt="Machine-readable artifacts banner — findings, skeptic, referee, fix strategy, fix plan, and CI automation outputs" width="100%"> | ||
| </p> | ||
| Every run produces machine-readable output at `.bug-hunter/findings.json` for pipeline automation: | ||
@@ -536,8 +628,16 @@ | ||
| | `findings.json` | Always | Machine-readable JSON for CI/CD and dashboards | | ||
| | `skeptic.json` | When findings exist | Canonical Skeptic challenge artifact | | ||
| | `referee.json` | When findings exist | Canonical Referee verdict artifact | | ||
| | `coverage.json` | Loop/autonomous runs | Canonical coverage and loop state | | ||
| | `triage.json` | Always | File classification, risk map, strategy selection, token estimates | | ||
| | `recon.md` | Always | Tech stack analysis, attack surface mapping, scan order | | ||
| | `findings.md` | Always | Raw Hunter findings before Skeptic review | | ||
| | `skeptic.md` | Always | Skeptic challenge decisions with evidence | | ||
| | `referee.md` | Always | Referee final verdicts with enrichment | | ||
| | `fix-report.md` | Fix mode | Per-bug fix status, verification results, git diff summary | | ||
| | `findings.md` | Optional | Markdown companion rendered from `findings.json` | | ||
| | `skeptic.md` | Optional | Markdown companion rendered from `skeptic.json` | | ||
| | `referee.md` | Optional | Markdown companion rendered from `referee.json` | | ||
| | `coverage.md` | Loop/autonomous runs | Markdown companion rendered from `coverage.json` | | ||
| | `fix-strategy.json` | When findings exist | Canonical remediation strategy: safe autofix vs manual review vs refactor vs architectural work | | ||
| | `fix-strategy.md` | When findings exist | Markdown companion rendered from `fix-strategy.json` | | ||
| | `fix-plan.json` | Plan/fix mode | Canonical execution plan for canary rollout, gating, and safe fix order | | ||
| | `fix-plan.md` | Plan/fix mode | Markdown companion rendered from `fix-plan.json` | | ||
| | `fix-report.md` | Fix mode | Markdown companion for fix results | | ||
| | `fix-report.json` | Fix mode | Machine-readable fix results for CI/CD gating and dashboards | | ||
@@ -569,12 +669,26 @@ | `worktree-*/` | Worktree fix mode | Temporary isolated worktrees for Fixer subagents (auto-cleaned) | | ||
| | `-b branch --base dev` | Scan branch diff against specific base | | ||
| | `--pr` | Easy alias for `--pr current` | | ||
| | `--pr current` | Review the current PR using GitHub metadata when available, with git fallback on the current branch | | ||
| | `--pr recent` | Review the most recently updated open PR | | ||
| | `--pr 123` | Review a specific PR number | | ||
| | `--pr-security` | Enterprise PR security review: PR scope + threat model + dependency context | | ||
| | `--last-pr` | Easy alias for `--pr recent` | | ||
| | `--review-pr` | Alias for `--pr current` | | ||
| | `--staged` | Scan git-staged files (pre-commit hook integration) | | ||
| | `--scan-only` | Report only — no code changes | | ||
| | `--review` | Easy alias for `--scan-only` | | ||
| | `--fix` | Find and auto-fix bugs (default behavior) | | ||
| | `--plan-only` | Build `fix-strategy.json` + fix plan, then stop before the fixer edits code | | ||
| | `--plan` | Easy alias for `--plan-only` | | ||
| | `--approve` | Interactive mode — ask before each fix | | ||
| | `--safe` | Easy alias for `--fix --approve` | | ||
| | `--autonomous` | Full auto-fix with zero intervention | | ||
| | `--loop` | Iterative mode — runs until 100% critical file coverage **(on by default)** | | ||
| | `--dry-run` | Preview planned fixes without editing files — outputs diff previews and `fix-report.json` | | ||
| | `--preview` | Easy alias for `--fix --dry-run` | | ||
| | `--loop` | Iterative mode — runs until 100% queued source-file coverage **(on by default)** | | ||
| | `--no-loop` | Disable loop mode — single-pass scan only | | ||
| | `--deps` | Include dependency CVE scanning with reachability analysis | | ||
| | `--threat-model` | Generate or use STRIDE threat model for targeted security analysis | | ||
| | `--dry-run` | Preview planned fixes without editing files — outputs diff previews and `fix-report.json` | | ||
| | `--security-review` | Run the bundled enterprise security-review workflow with threat model + CVE + validation context | | ||
| | `--validate-security` | Force vulnerability-validation for confirmed security findings | | ||
@@ -589,2 +703,4 @@ All flags compose: `/bug-hunter --deps --threat-model --fix src/` | ||
| The repository also ships with **60 Node.js regression tests** covering orchestration, schemas, PR scope resolution, fix-plan validation, lock behavior, worktree lifecycle, and the bundled local security-skill routing. | ||
| ```bash | ||
@@ -610,2 +726,4 @@ /bug-hunter test-fixture/ | ||
| ├── CHANGELOG.md # Version history | ||
| ├── llms.txt # Short LLM-facing summary | ||
| ├── llms-full.txt # Full LLM-facing reference | ||
| ├── package.json # npm package config (@codexstar/bug-hunter) | ||
@@ -618,7 +736,11 @@ │ | ||
| │ └── images/ # Documentation visuals | ||
| │ ├── hero.png # Hero banner | ||
| │ ├── pipeline-overview.png # 8-stage pipeline diagram | ||
| │ ├── adversarial-debate.png # Hunter vs Skeptic vs Referee flow | ||
| │ ├── doc-verify-fix-plan.png # Documentation verification + fix planning | ||
| │ └── security-finding-card.png # Enriched finding card with CVSS | ||
| │ ├── 2026-03-12-hero-bug-hunter-overview.png # Product overview hero | ||
| │ ├── 2026-03-12-pr-review-flow.png # PR review + security workflow | ||
| │ ├── 2026-03-12-security-pack.png # Bundled local security pack | ||
| │ ├── 2026-03-12-fix-plan-rollout.png # Strategic fix planning + rollout | ||
| │ ├── 2026-03-12-machine-readable-artifacts.png # CI/CD artifact outputs | ||
| │ ├── pipeline-overview.png # 8-stage pipeline diagram | ||
| │ ├── adversarial-debate.png # Hunter vs Skeptic vs Referee flow | ||
| │ ├── doc-verify-fix-plan.png # Documentation verification + fix planning | ||
| │ └── security-finding-card.png # Enriched finding card with CVSS | ||
| │ | ||
@@ -650,2 +772,15 @@ ├── modes/ # Execution strategies by codebase size | ||
| │ | ||
| ├── schemas/ # Canonical JSON artifact contracts | ||
| │ ├── findings.schema.json # Hunter findings schema | ||
| │ ├── skeptic.schema.json # Skeptic artifact schema | ||
| │ ├── referee.schema.json # Referee artifact schema | ||
| │ ├── fix-strategy.schema.json # Strategic remediation schema | ||
| │ └── fix-plan.schema.json # Fix execution schema | ||
| │ | ||
| ├── skills/ # Bundled local security pack | ||
| │ ├── commit-security-scan/ | ||
| │ ├── security-review/ | ||
| │ ├── threat-model-generation/ | ||
| │ └── vulnerability-validation/ | ||
| │ | ||
| ├── scripts/ # Node.js helpers (zero AI tokens) | ||
@@ -652,0 +787,0 @@ │ ├── triage.cjs # File classification (<2s) |
@@ -6,2 +6,3 @@ #!/usr/bin/env node | ||
| const path = require('path'); | ||
| const { validateArtifactValue } = require('./schema-runtime.cjs'); | ||
@@ -167,3 +168,3 @@ const VALID_CHUNK_STATUS = new Set(['pending', 'in_progress', 'done', 'failed']); | ||
| function toConfidence(value) { | ||
| function toConfidenceScore(value) { | ||
| if (value === null || value === undefined || value === '') { | ||
@@ -275,3 +276,9 @@ return null; | ||
| const findings = readJson(findingsJsonPath); | ||
| assertArray(findings, 'findingsJson'); | ||
| const validation = validateArtifactValue({ | ||
| artifactName: 'findings', | ||
| value: findings | ||
| }); | ||
| if (!validation.ok) { | ||
| throw new Error(`Invalid findings artifact: ${validation.errors.join('; ')}`); | ||
| } | ||
@@ -282,9 +289,10 @@ let inserted = 0; | ||
| const file = String(finding.file || '').trim(); | ||
| if (!file) { | ||
| continue; | ||
| } | ||
| const lines = String(finding.lines || '').trim(); | ||
| const claim = String(finding.claim || '').trim(); | ||
| const severity = String(finding.severity || 'Low'); | ||
| const confidence = toConfidence(finding.confidence); | ||
| const category = String(finding.category || '').trim(); | ||
| const evidence = String(finding.evidence || '').trim(); | ||
| const runtimeTrigger = String(finding.runtimeTrigger || '').trim(); | ||
| const crossReferences = Array.isArray(finding.crossReferences) ? finding.crossReferences : []; | ||
| const confidenceScore = toConfidenceScore(finding.confidenceScore); | ||
| const bugId = String(finding.bugId || '').trim(); | ||
@@ -300,4 +308,8 @@ const key = `${file}|${lines}|${claim}`; | ||
| lines, | ||
| category, | ||
| claim, | ||
| confidence, | ||
| evidence, | ||
| runtimeTrigger, | ||
| crossReferences, | ||
| confidenceScore, | ||
| status: 'open', | ||
@@ -318,6 +330,10 @@ source, | ||
| } | ||
| if (existing.confidence === null && confidence !== null) { | ||
| existing.confidence = confidence; | ||
| } else if (existing.confidence !== null && confidence !== null) { | ||
| existing.confidence = Math.max(existing.confidence, confidence); | ||
| existing.category = category || existing.category; | ||
| existing.evidence = evidence || existing.evidence; | ||
| existing.runtimeTrigger = runtimeTrigger || existing.runtimeTrigger; | ||
| existing.crossReferences = crossReferences.length > 0 ? crossReferences : existing.crossReferences; | ||
| if (existing.confidenceScore === null && confidenceScore !== null) { | ||
| existing.confidenceScore = confidenceScore; | ||
| } else if (existing.confidenceScore !== null && confidenceScore !== null) { | ||
| existing.confidenceScore = Math.max(existing.confidenceScore, confidenceScore); | ||
| } | ||
@@ -332,3 +348,3 @@ existing.updatedAt = nowIso(); | ||
| state.metrics.lowConfidenceFindings = state.bugLedger.filter((entry) => { | ||
| return entry.confidence === null || entry.confidence < 75; | ||
| return entry.confidenceScore === null || entry.confidenceScore < 75; | ||
| }).length; | ||
@@ -505,2 +521,9 @@ saveState(statePath, state); | ||
| const fixPlan = readJson(fixPlanJsonPath); | ||
| const validation = validateArtifactValue({ | ||
| artifactName: 'fix-plan', | ||
| value: fixPlan | ||
| }); | ||
| if (!validation.ok) { | ||
| throw new Error(`Invalid fix-plan artifact: ${validation.errors.join('; ')}`); | ||
| } | ||
| state.fixPlan = fixPlan; | ||
@@ -507,0 +530,0 @@ saveState(statePath, state); |
@@ -456,7 +456,14 @@ #!/usr/bin/env node | ||
| .map((filePath) => path.resolve(filePath)); | ||
| const tempSeedPath = path.join(path.dirname(path.resolve(bugsJsonPath)), '.seed-files.tmp.json'); | ||
| const tempSeedPath = path.join( | ||
| path.dirname(path.resolve(bugsJsonPath)), | ||
| `.seed-files.${process.pid}.${Date.now()}.${crypto.randomUUID()}.tmp.json` | ||
| ); | ||
| writeJson(tempSeedPath, seedFiles); | ||
| const result = query(indexPath, tempSeedPath, hopsRaw); | ||
| fs.unlinkSync(tempSeedPath); | ||
| return result; | ||
| try { | ||
| return query(indexPath, tempSeedPath, hopsRaw); | ||
| } finally { | ||
| if (fs.existsSync(tempSeedPath)) { | ||
| fs.unlinkSync(tempSeedPath); | ||
| } | ||
| } | ||
| } | ||
@@ -463,0 +470,0 @@ |
+95
-25
| #!/usr/bin/env node | ||
| const crypto = require('crypto'); | ||
| const fs = require('fs'); | ||
@@ -10,5 +11,6 @@ const os = require('os'); | ||
| console.error(' fix-lock.cjs acquire <lockPath> [ttlSeconds]'); | ||
| console.error(' fix-lock.cjs renew <lockPath>'); | ||
| console.error(' fix-lock.cjs release <lockPath>'); | ||
| console.error(' fix-lock.cjs renew <lockPath> <ownerToken>'); | ||
| console.error(' fix-lock.cjs release <lockPath> <ownerToken>'); | ||
| console.error(' fix-lock.cjs status <lockPath> [ttlSeconds]'); | ||
| console.error(' Note: acquire returns lock.ownerToken; pass it to renew/release.'); | ||
| } | ||
@@ -28,3 +30,7 @@ | ||
| } | ||
| return JSON.parse(fs.readFileSync(lockPath, 'utf8')); | ||
| try { | ||
| return JSON.parse(fs.readFileSync(lockPath, 'utf8')); | ||
| } catch { | ||
| return null; | ||
| } | ||
| } | ||
@@ -52,3 +58,3 @@ | ||
| function writeLock(lockPath) { | ||
| function writeLock(lockPath, ownerTokenRaw, exclusive = true) { | ||
| ensureParent(lockPath); | ||
@@ -59,10 +65,23 @@ const lockData = { | ||
| cwd: process.cwd(), | ||
| ownerToken: ownerTokenRaw || crypto.randomUUID(), | ||
| createdAtMs: nowMs(), | ||
| createdAt: new Date().toISOString() | ||
| }; | ||
| fs.writeFileSync(lockPath, `${JSON.stringify(lockData, null, 2)}\n`, 'utf8'); | ||
| const fd = fs.openSync(lockPath, exclusive ? 'wx' : 'w'); | ||
| try { | ||
| fs.writeFileSync(fd, `${JSON.stringify(lockData, null, 2)}\n`, 'utf8'); | ||
| } finally { | ||
| fs.closeSync(fd); | ||
| } | ||
| return lockData; | ||
| } | ||
| function renew(lockPath) { | ||
| function assertOwner(existing, ownerToken) { | ||
| if (!existing || !existing.ownerToken) { | ||
| return true; | ||
| } | ||
| return ownerToken === existing.ownerToken; | ||
| } | ||
| function renew(lockPath, ownerToken) { | ||
| const existing = readLock(lockPath); | ||
@@ -74,5 +93,12 @@ if (!existing) { | ||
| } | ||
| if (!assertOwner(existing, ownerToken)) { | ||
| console.log(JSON.stringify({ ok: false, renewed: false, reason: 'lock-owner-mismatch' }, null, 2)); | ||
| process.exit(1); | ||
| return; | ||
| } | ||
| existing.createdAtMs = nowMs(); | ||
| existing.renewedAt = new Date().toISOString(); | ||
| fs.writeFileSync(lockPath, `${JSON.stringify(existing, null, 2)}\n`, 'utf8'); | ||
| const tempPath = `${lockPath}.${process.pid}.tmp`; | ||
| fs.writeFileSync(tempPath, `${JSON.stringify(existing, null, 2)}\n`, 'utf8'); | ||
| fs.renameSync(tempPath, lockPath); | ||
| console.log(JSON.stringify({ ok: true, renewed: true, lock: existing }, null, 2)); | ||
@@ -84,12 +110,35 @@ } | ||
| if (!existing) { | ||
| const lockData = writeLock(lockPath); | ||
| console.log(JSON.stringify({ ok: true, acquired: true, lock: lockData }, null, 2)); | ||
| return; | ||
| if (fs.existsSync(lockPath)) { | ||
| fs.unlinkSync(lockPath); | ||
| } | ||
| try { | ||
| const lockData = writeLock(lockPath); | ||
| console.log(JSON.stringify({ ok: true, acquired: true, lock: lockData }, null, 2)); | ||
| return; | ||
| } catch (error) { | ||
| if (error && error.code === 'EEXIST') { | ||
| const current = readLock(lockPath); | ||
| console.log(JSON.stringify({ | ||
| ok: false, | ||
| acquired: false, | ||
| reason: 'lock-held', | ||
| lock: current | ||
| }, null, 2)); | ||
| process.exit(1); | ||
| return; | ||
| } | ||
| throw error; | ||
| } | ||
| } | ||
| if (!lockIsStale(existing, ttlSeconds)) { | ||
| const stale = lockIsStale(existing, ttlSeconds); | ||
| const ownerAlive = typeof existing.pid === 'number' ? pidAlive(existing.pid) : false; | ||
| if (!stale || ownerAlive) { | ||
| console.log(JSON.stringify({ | ||
| ok: false, | ||
| acquired: false, | ||
| reason: 'lock-held', | ||
| reason: ownerAlive ? 'lock-held-by-live-owner' : 'lock-held', | ||
| stale, | ||
| ownerAlive, | ||
| lock: existing | ||
@@ -101,13 +150,29 @@ }, null, 2)); | ||
| fs.unlinkSync(lockPath); | ||
| const lockData = writeLock(lockPath); | ||
| console.log(JSON.stringify({ | ||
| ok: true, | ||
| acquired: true, | ||
| recoveredFromStaleLock: true, | ||
| previousLock: existing, | ||
| lock: lockData | ||
| }, null, 2)); | ||
| try { | ||
| const lockData = writeLock(lockPath); | ||
| console.log(JSON.stringify({ | ||
| ok: true, | ||
| acquired: true, | ||
| recoveredFromStaleLock: true, | ||
| previousLock: existing, | ||
| lock: lockData | ||
| }, null, 2)); | ||
| return; | ||
| } catch (error) { | ||
| if (error && error.code === 'EEXIST') { | ||
| const current = readLock(lockPath); | ||
| console.log(JSON.stringify({ | ||
| ok: false, | ||
| acquired: false, | ||
| reason: 'lock-held', | ||
| lock: current | ||
| }, null, 2)); | ||
| process.exit(1); | ||
| return; | ||
| } | ||
| throw error; | ||
| } | ||
| } | ||
| function release(lockPath) { | ||
| function release(lockPath, ownerToken) { | ||
| const existing = readLock(lockPath); | ||
@@ -118,2 +183,7 @@ if (!existing) { | ||
| } | ||
| if (!assertOwner(existing, ownerToken)) { | ||
| console.log(JSON.stringify({ ok: false, released: false, reason: 'lock-owner-mismatch' }, null, 2)); | ||
| process.exit(1); | ||
| return; | ||
| } | ||
| fs.unlinkSync(lockPath); | ||
@@ -141,3 +211,3 @@ console.log(JSON.stringify({ ok: true, released: true, previousLock: existing }, null, 2)); | ||
| function main() { | ||
| const [command, lockPath, ttlRaw] = process.argv.slice(2); | ||
| const [command, lockPath, rawArg] = process.argv.slice(2); | ||
| if (!command || !lockPath) { | ||
@@ -147,3 +217,3 @@ usage(); | ||
| } | ||
| const ttlParsed = Number.parseInt(ttlRaw || '', 10); | ||
| const ttlParsed = Number.parseInt(rawArg || '', 10); | ||
| const ttlSeconds = Number.isInteger(ttlParsed) && ttlParsed > 0 ? ttlParsed : 1800; | ||
@@ -156,7 +226,7 @@ | ||
| if (command === 'renew') { | ||
| renew(lockPath); | ||
| renew(lockPath, rawArg); | ||
| return; | ||
| } | ||
| if (command === 'release') { | ||
| release(lockPath); | ||
| release(lockPath, rawArg); | ||
| return; | ||
@@ -163,0 +233,0 @@ } |
@@ -5,2 +5,6 @@ #!/usr/bin/env node | ||
| const path = require('path'); | ||
| const { | ||
| createSchemaRef, | ||
| validateSchemaRef | ||
| } = require('./schema-runtime.cjs'); | ||
@@ -20,3 +24,3 @@ const REQUIRED_BY_ROLE = { | ||
| targetFiles: ['src/example.ts'], | ||
| outputSchema: { format: 'risk-map', version: 1 } | ||
| outputSchema: createSchemaRef('recon') | ||
| }, | ||
@@ -27,3 +31,3 @@ 'triage-hunter': { | ||
| techStack: { framework: '', auth: '', database: '', dependencies: [] }, | ||
| outputSchema: { format: 'triage-findings', version: 1 } | ||
| outputSchema: createSchemaRef('findings') | ||
| }, | ||
@@ -35,3 +39,3 @@ hunter: { | ||
| techStack: { framework: '', auth: '', database: '', dependencies: [] }, | ||
| outputSchema: { format: 'findings', version: 1 } | ||
| outputSchema: createSchemaRef('findings') | ||
| }, | ||
@@ -53,9 +57,20 @@ skeptic: { | ||
| techStack: { framework: '', auth: '', database: '', dependencies: [] }, | ||
| outputSchema: { format: 'challenges', version: 1 } | ||
| outputSchema: createSchemaRef('skeptic') | ||
| }, | ||
| referee: { | ||
| skillDir: '/absolute/path/to/bug-hunter', | ||
| findings: [{ bugId: 'BUG-1', severity: '', file: '', lines: '', claim: '', evidence: '', runtimeTrigger: '' }], | ||
| findings: [{ | ||
| bugId: 'BUG-1', | ||
| severity: 'Critical', | ||
| category: 'security', | ||
| file: 'src/example.ts', | ||
| lines: '10-15', | ||
| claim: 'One-sentence description of the bug', | ||
| evidence: 'Exact code quote from the file', | ||
| runtimeTrigger: 'Specific scenario that triggers this bug', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 92 | ||
| }], | ||
| skepticResults: { accepted: ['BUG-1'], disproved: [], details: [] }, | ||
| outputSchema: { format: 'verdicts', version: 1 } | ||
| outputSchema: createSchemaRef('referee') | ||
| }, | ||
@@ -75,3 +90,3 @@ fixer: { | ||
| techStack: { framework: '', auth: '', database: '', dependencies: [] }, | ||
| outputSchema: { format: 'fix-report', version: 1 } | ||
| outputSchema: createSchemaRef('fix-report') | ||
| } | ||
@@ -139,5 +154,4 @@ }; | ||
| if ('outputSchema' in payload) { | ||
| if (!payload.outputSchema || typeof payload.outputSchema !== 'object') { | ||
| errors.push('outputSchema must be an object'); | ||
| } | ||
| const schemaValidation = validateSchemaRef(payload.outputSchema); | ||
| errors.push(...schemaValidation.errors); | ||
| } | ||
@@ -144,0 +158,0 @@ |
+667
-32
@@ -6,2 +6,3 @@ #!/usr/bin/env node | ||
| const path = require('path'); | ||
| const { validateArtifactFile, validateArtifactValue } = require('./schema-runtime.cjs'); | ||
@@ -21,3 +22,4 @@ const BACKEND_PRIORITY = ['spawn_agent', 'subagent', 'teams', 'local-sequential']; | ||
| console.error(' run-bug-hunter.cjs preflight [--skill-dir <path>] [--available-backends <csv>] [--backend <name>]'); | ||
| console.error(' run-bug-hunter.cjs run --files-json <path> [--mode <name>] [--skill-dir <path>] [--state <path>] [--chunk-size <n>] [--worker-cmd <template>] [--timeout-ms <n>] [--max-retries <n>] [--backoff-ms <n>] [--available-backends <csv>] [--backend <name>] [--fail-fast <true|false>] [--use-index <true|false>] [--index-path <path>] [--delta-mode <true|false>] [--changed-files-json <path>] [--delta-hops <n>] [--expand-on-low-confidence <true|false>] [--confidence-threshold <n>] [--canary-size <n>] [--expansion-cap <n>]'); | ||
| console.error(' run-bug-hunter.cjs run --files-json <path> [--mode <name>] [--skill-dir <path>] [--state <path>] [--chunk-size <n>] [--worker-cmd <template>] [--timeout-ms <n>] [--max-retries <n>] [--backoff-ms <n>] [--available-backends <csv>] [--backend <name>] [--fail-fast <true|false>] [--use-index <true|false>] [--index-path <path>] [--delta-mode <true|false>] [--changed-files-json <path>] [--delta-hops <n>] [--expand-on-low-confidence <true|false>] [--confidence-threshold <n>] [--canary-size <n>] [--expansion-cap <n>] [--strategy-path <path>] [--strategy-markdown-path <path>]'); | ||
| console.error(' run-bug-hunter.cjs phase --artifact <name> --output-path <path> --worker-cmd <template> [--phase-name <name>] [--skill-dir <path>] [--journal-path <path>] [--render-cmd <template>] [--render-output-path <path>] [--timeout-ms <n>] [--render-timeout-ms <n>] [--max-retries <n>] [--backoff-ms <n>]'); | ||
| console.error(' run-bug-hunter.cjs plan --files-json <path> [--mode <name>] [--skill-dir <path>] [--chunk-size <n>] [--plan-path <path>]'); | ||
@@ -119,6 +121,19 @@ } | ||
| path.join(skillDir, 'scripts', 'payload-guard.cjs'), | ||
| path.join(skillDir, 'scripts', 'schema-validate.cjs'), | ||
| path.join(skillDir, 'scripts', 'schema-runtime.cjs'), | ||
| path.join(skillDir, 'scripts', 'render-report.cjs'), | ||
| path.join(skillDir, 'scripts', 'fix-lock.cjs'), | ||
| path.join(skillDir, 'scripts', 'doc-lookup.cjs'), | ||
| path.join(skillDir, 'scripts', 'context7-api.cjs'), | ||
| path.join(skillDir, 'scripts', 'delta-mode.cjs') | ||
| path.join(skillDir, 'scripts', 'delta-mode.cjs'), | ||
| path.join(skillDir, 'scripts', 'pr-scope.cjs'), | ||
| path.join(skillDir, 'schemas', 'findings.schema.json'), | ||
| path.join(skillDir, 'schemas', 'skeptic.schema.json'), | ||
| path.join(skillDir, 'schemas', 'referee.schema.json'), | ||
| path.join(skillDir, 'schemas', 'coverage.schema.json'), | ||
| path.join(skillDir, 'schemas', 'fix-report.schema.json'), | ||
| path.join(skillDir, 'schemas', 'fix-plan.schema.json'), | ||
| path.join(skillDir, 'schemas', 'fix-strategy.schema.json'), | ||
| path.join(skillDir, 'schemas', 'recon.schema.json'), | ||
| path.join(skillDir, 'schemas', 'shared.schema.json') | ||
| ]; | ||
@@ -155,2 +170,14 @@ } | ||
| function runTextScript(scriptPath, args) { | ||
| const result = childProcess.spawnSync('node', [scriptPath, ...args], { | ||
| encoding: 'utf8' | ||
| }); | ||
| if (result.status !== 0) { | ||
| const stderr = (result.stderr || '').trim(); | ||
| const stdout = (result.stdout || '').trim(); | ||
| throw new Error(stderr || stdout || `Script failed: ${scriptPath}`); | ||
| } | ||
| return result.stdout || ''; | ||
| } | ||
| function appendJournal(logPath, event) { | ||
@@ -162,8 +189,16 @@ ensureDir(path.dirname(logPath)); | ||
| function shellQuote(value) { | ||
| const stringValue = String(value); | ||
| if (stringValue.length === 0) { | ||
| return "''"; | ||
| } | ||
| return `'${stringValue.replace(/'/g, `'\\''`)}'`; | ||
| } | ||
| function fillTemplate(template, variables) { | ||
| return template.replace(/\{([a-zA-Z0-9_]+)\}/g, (match, key) => { | ||
| if (!(key in variables)) { | ||
| return match; | ||
| throw new Error(`Unknown template placeholder: ${key}`); | ||
| } | ||
| return String(variables[key]); | ||
| return shellQuote(variables[key]); | ||
| }); | ||
@@ -221,3 +256,5 @@ } | ||
| phase, | ||
| chunkId | ||
| chunkId, | ||
| beforeAttempt, | ||
| postAttempt | ||
| }) { | ||
@@ -236,4 +273,27 @@ const attempts = maxRetries + 1; | ||
| }); | ||
| if (typeof beforeAttempt === 'function') { | ||
| await beforeAttempt({ attempt }); | ||
| } | ||
| const result = await runCommandOnce({ command, timeoutMs }); | ||
| lastResult = result; | ||
| let finalResult = result; | ||
| if (finalResult.ok && typeof postAttempt === 'function') { | ||
| const postAttemptResult = await postAttempt({ attempt }); | ||
| if (!postAttemptResult.ok) { | ||
| const validationMessage = String(postAttemptResult.errorMessage || 'post-attempt validation failed'); | ||
| appendJournal(journalPath, { | ||
| event: 'attempt-post-check-failed', | ||
| phase, | ||
| chunkId, | ||
| attempt, | ||
| errorMessage: validationMessage.slice(0, 500) | ||
| }); | ||
| finalResult = { | ||
| ...finalResult, | ||
| ok: false, | ||
| stderr: validationMessage | ||
| }; | ||
| } | ||
| } | ||
| appendJournal(journalPath, { | ||
@@ -244,9 +304,11 @@ event: 'attempt-end', | ||
| attempt, | ||
| ok: result.ok, | ||
| code: result.code, | ||
| timeoutHit: result.timeoutHit, | ||
| stderr: result.stderr.slice(0, 500) | ||
| ok: finalResult.ok, | ||
| code: finalResult.code, | ||
| timeoutHit: finalResult.timeoutHit, | ||
| stderr: finalResult.stderr.slice(0, 500) | ||
| }); | ||
| if (result.ok) { | ||
| return { ok: true, result, attemptsUsed: attempt }; | ||
| lastResult = finalResult; | ||
| if (finalResult.ok) { | ||
| return { ok: true, result: finalResult, attemptsUsed: attempt }; | ||
| } | ||
@@ -389,4 +451,4 @@ if (attempt < attempts) { | ||
| const lowConfidence = bugLedger.filter((entry) => { | ||
| const confidence = entry.confidence; | ||
| return confidence === null || confidence === undefined || Number(confidence) < confidenceThreshold; | ||
| const confidenceScore = entry.confidenceScore; | ||
| return confidenceScore === null || confidenceScore === undefined || Number(confidenceScore) < confidenceThreshold; | ||
| }).length; | ||
@@ -403,13 +465,52 @@ | ||
| function buildFixPlan({ bugLedger, confidenceThreshold, canarySize }) { | ||
| const withConfidence = bugLedger.map((entry) => { | ||
| const confidenceRaw = entry.confidence; | ||
| const confidence = Number.isFinite(Number(confidenceRaw)) ? Number(confidenceRaw) : null; | ||
| function buildConflictSets(consistency) { | ||
| const conflicts = toArray(consistency && consistency.conflicts); | ||
| const bugIds = new Set(); | ||
| const locations = new Set(); | ||
| for (const conflict of conflicts) { | ||
| if (conflict && conflict.type === 'bug-id-reused' && conflict.bugId) { | ||
| bugIds.add(String(conflict.bugId)); | ||
| } | ||
| if (conflict && conflict.type === 'location-claim-conflict' && conflict.location) { | ||
| locations.add(String(conflict.location)); | ||
| } | ||
| } | ||
| return { bugIds, locations }; | ||
| } | ||
| function applyConflictClassification(entry, classification, conflictSets) { | ||
| const bugId = String(entry.bugId || '').trim(); | ||
| const location = `${entry.file || ''}|${entry.lines || ''}`; | ||
| const hasConflict = conflictSets.bugIds.has(bugId) || conflictSets.locations.has(location); | ||
| if (!hasConflict) { | ||
| return classification; | ||
| } | ||
| return { | ||
| strategy: 'manual-review', | ||
| executionStage: 'manual-review', | ||
| autofixEligible: false, | ||
| reason: 'Consistency conflict requires manual review before any fix is attempted.' | ||
| }; | ||
| } | ||
| function buildFixPlan({ bugLedger, confidenceThreshold, canarySize, consistency }) { | ||
| const conflictSets = buildConflictSets(consistency); | ||
| const classifiedEntries = bugLedger.map((entry) => { | ||
| const confidenceRaw = entry.confidenceScore; | ||
| const confidenceScore = Number.isFinite(Number(confidenceRaw)) ? Number(confidenceRaw) : null; | ||
| const classification = applyConflictClassification( | ||
| entry, | ||
| classifyStrategy({ ...entry, confidenceScore }, confidenceThreshold), | ||
| conflictSets | ||
| ); | ||
| return { | ||
| ...entry, | ||
| confidence | ||
| confidenceScore, | ||
| ...classification | ||
| }; | ||
| }); | ||
| const eligible = withConfidence | ||
| .filter((entry) => entry.confidence !== null && entry.confidence >= confidenceThreshold) | ||
| const eligible = classifiedEntries | ||
| .filter((entry) => entry.autofixEligible === true) | ||
| .sort((left, right) => { | ||
@@ -420,3 +521,3 @@ const severityDiff = severityRank(right.severity) - severityRank(left.severity); | ||
| } | ||
| const confidenceDiff = (right.confidence || 0) - (left.confidence || 0); | ||
| const confidenceDiff = (right.confidenceScore || 0) - (left.confidenceScore || 0); | ||
| if (confidenceDiff !== 0) { | ||
@@ -427,4 +528,4 @@ return confidenceDiff; | ||
| }); | ||
| const manualReview = withConfidence | ||
| .filter((entry) => entry.confidence === null || entry.confidence < confidenceThreshold); | ||
| const manualReview = classifiedEntries | ||
| .filter((entry) => entry.autofixEligible !== true); | ||
| const canary = eligible.slice(0, canarySize); | ||
@@ -438,3 +539,3 @@ const rollout = eligible.slice(canarySize); | ||
| totals: { | ||
| findings: withConfidence.length, | ||
| findings: classifiedEntries.length, | ||
| eligible: eligible.length, | ||
@@ -451,2 +552,427 @@ canary: canary.length, | ||
| function classifyStrategy(entry, confidenceThreshold) { | ||
| const confidenceScore = Number.isFinite(Number(entry.confidenceScore)) ? Number(entry.confidenceScore) : null; | ||
| const claim = String(entry.claim || '').toLowerCase(); | ||
| const crossReferences = toArray(entry.crossReferences); | ||
| const architecturalSignals = ['architecture', 'migration', 'schema', 'contract', 'signature', 'protocol']; | ||
| const refactorSignals = ['refactor', 'transaction', 'concurrency', 'race', 'lock ordering']; | ||
| if (confidenceScore === null || confidenceScore < confidenceThreshold) { | ||
| return { | ||
| strategy: 'manual-review', | ||
| executionStage: 'manual-review', | ||
| autofixEligible: false, | ||
| reason: 'Confidence is below the autofix threshold.' | ||
| }; | ||
| } | ||
| if (architecturalSignals.some((signal) => claim.includes(signal)) || crossReferences.length >= 3) { | ||
| return { | ||
| strategy: 'architectural-remediation', | ||
| executionStage: 'report-only', | ||
| autofixEligible: false, | ||
| reason: 'Claim spans broader contracts or architecture boundaries.' | ||
| }; | ||
| } | ||
| if (refactorSignals.some((signal) => claim.includes(signal)) || severityRank(entry.severity) >= 2 && crossReferences.length >= 2) { | ||
| return { | ||
| strategy: 'larger-refactor', | ||
| executionStage: 'manual-review', | ||
| autofixEligible: false, | ||
| reason: 'Fix likely needs coordinated multi-file changes beyond a surgical patch.' | ||
| }; | ||
| } | ||
| return { | ||
| strategy: 'safe-autofix', | ||
| executionStage: severityRank(entry.severity) >= 2 ? 'canary' : 'rollout', | ||
| autofixEligible: true, | ||
| reason: 'Finding is localized enough for a guarded surgical fix.' | ||
| }; | ||
| } | ||
| function recommendedActionForStrategy(strategy) { | ||
| if (strategy === 'architectural-remediation') { | ||
| return 'Do not auto-edit. Capture a remediation design and schedule a broader change.'; | ||
| } | ||
| if (strategy === 'larger-refactor') { | ||
| return 'Pause before patching. Review interfaces, callers, and rollback scope with a human.'; | ||
| } | ||
| if (strategy === 'manual-review') { | ||
| return 'Keep this in the report and require human approval before any edits.'; | ||
| } | ||
| return 'Proceed through the guarded fix pipeline with canary verification and rollback safety.'; | ||
| } | ||
| function buildFixStrategy({ bugLedger, confidenceThreshold, consistency }) { | ||
| const conflictSets = buildConflictSets(consistency); | ||
| const normalized = bugLedger.map((entry) => { | ||
| const confidenceScore = Number.isFinite(Number(entry.confidenceScore)) ? Number(entry.confidenceScore) : null; | ||
| const classification = applyConflictClassification( | ||
| entry, | ||
| classifyStrategy({ ...entry, confidenceScore }, confidenceThreshold), | ||
| conflictSets | ||
| ); | ||
| const filePath = String(entry.file || '').trim() || 'unknown-file'; | ||
| const clusterDir = path.dirname(filePath); | ||
| const clusterSeed = `${classification.strategy}|${classification.executionStage}|${clusterDir}`; | ||
| return { | ||
| ...entry, | ||
| confidenceScore, | ||
| file: filePath, | ||
| clusterDir, | ||
| clusterSeed, | ||
| ...classification | ||
| }; | ||
| }); | ||
| const byCluster = new Map(); | ||
| for (const entry of normalized) { | ||
| if (!byCluster.has(entry.clusterSeed)) { | ||
| byCluster.set(entry.clusterSeed, []); | ||
| } | ||
| byCluster.get(entry.clusterSeed).push(entry); | ||
| } | ||
| const clusters = [...byCluster.entries()].map(([clusterSeed, entries], index) => { | ||
| const strategy = entries[0].strategy; | ||
| const executionStage = entries[0].executionStage; | ||
| const files = [...new Set(entries.map((entry) => entry.file))].sort(); | ||
| const bugIds = [...new Set(entries.map((entry) => String(entry.bugId || entry.key || '').trim()).filter(Boolean))]; | ||
| const maxSeverity = entries | ||
| .map((entry) => entry.severity) | ||
| .sort((left, right) => severityRank(right) - severityRank(left))[0] || 'LOW'; | ||
| const reasons = [...new Set(entries.map((entry) => entry.reason).filter(Boolean))]; | ||
| const firstDir = entries[0].clusterDir || path.dirname(files[0] || 'unknown-file'); | ||
| return { | ||
| clusterId: `cluster-${index + 1}`, | ||
| strategy, | ||
| executionStage, | ||
| autofixEligible: entries.every((entry) => entry.autofixEligible), | ||
| bugIds, | ||
| files, | ||
| maxSeverity, | ||
| summary: `${bugIds.length} bug(s) in ${firstDir || '.'} classified as ${strategy}.`, | ||
| recommendedAction: recommendedActionForStrategy(strategy), | ||
| reasons | ||
| }; | ||
| }).sort((left, right) => { | ||
| const stageRank = { | ||
| canary: 0, | ||
| rollout: 1, | ||
| 'manual-review': 2, | ||
| 'report-only': 3 | ||
| }; | ||
| const stageDiff = stageRank[left.executionStage] - stageRank[right.executionStage]; | ||
| if (stageDiff !== 0) { | ||
| return stageDiff; | ||
| } | ||
| return severityRank(right.maxSeverity) - severityRank(left.maxSeverity); | ||
| }); | ||
| const summary = { | ||
| confirmed: normalized.length, | ||
| safeAutofix: normalized.filter((entry) => entry.strategy === 'safe-autofix').length, | ||
| manualReview: normalized.filter((entry) => entry.strategy === 'manual-review').length, | ||
| largerRefactor: normalized.filter((entry) => entry.strategy === 'larger-refactor').length, | ||
| architecturalRemediation: normalized.filter((entry) => entry.strategy === 'architectural-remediation').length, | ||
| canaryCandidates: normalized.filter((entry) => entry.executionStage === 'canary').length, | ||
| rolloutCandidates: normalized.filter((entry) => entry.executionStage === 'rollout').length | ||
| }; | ||
| return { | ||
| version: '3.1.0', | ||
| generatedAt: nowIso(), | ||
| confidenceThreshold, | ||
| summary, | ||
| clusters | ||
| }; | ||
| } | ||
| function toCoverageStatus(chunkStatus) { | ||
| if (chunkStatus === 'done') { | ||
| return 'done'; | ||
| } | ||
| if (chunkStatus === 'in_progress') { | ||
| return 'in_progress'; | ||
| } | ||
| if (chunkStatus === 'failed') { | ||
| return 'failed'; | ||
| } | ||
| return 'pending'; | ||
| } | ||
| function buildCoverageArtifact({ state, fixPlan }) { | ||
| const fileEntries = toArray(state.chunks).flatMap((chunk) => { | ||
| return toArray(chunk.files).map((filePath) => { | ||
| return { | ||
| path: String(filePath), | ||
| status: toCoverageStatus(chunk.status) | ||
| }; | ||
| }); | ||
| }); | ||
| const bugs = toArray(state.bugLedger).map((entry) => { | ||
| return { | ||
| bugId: String(entry.bugId || '').trim() || String(entry.key || '').trim(), | ||
| severity: String(entry.severity || 'Low'), | ||
| file: String(entry.file || '').trim(), | ||
| claim: String(entry.claim || '').trim() | ||
| }; | ||
| }); | ||
| const fixStatusByBugId = new Map(); | ||
| for (const entry of toArray(fixPlan && fixPlan.canary)) { | ||
| fixStatusByBugId.set(String(entry.bugId || '').trim(), 'CANARY'); | ||
| } | ||
| for (const entry of toArray(fixPlan && fixPlan.rollout)) { | ||
| fixStatusByBugId.set(String(entry.bugId || '').trim(), 'ROLLOUT'); | ||
| } | ||
| for (const entry of toArray(fixPlan && fixPlan.manualReview)) { | ||
| fixStatusByBugId.set(String(entry.bugId || '').trim(), 'MANUAL_REVIEW'); | ||
| } | ||
| const fixes = [...fixStatusByBugId.entries()] | ||
| .filter(([bugId]) => Boolean(bugId)) | ||
| .map(([bugId, status]) => { | ||
| return { | ||
| bugId, | ||
| status | ||
| }; | ||
| }); | ||
| const hasOpenChunks = toArray(state.chunks).some((chunk) => chunk.status !== 'done'); | ||
| return { | ||
| schemaVersion: 1, | ||
| iteration: 1, | ||
| status: hasOpenChunks ? 'IN_PROGRESS' : 'COMPLETE', | ||
| files: fileEntries, | ||
| bugs, | ||
| fixes | ||
| }; | ||
| } | ||
| function renderCoverageMarkdown(coverage) { | ||
| const lines = [ | ||
| '# Bug Hunter Coverage', | ||
| '', | ||
| `- Status: ${coverage.status}`, | ||
| `- Iteration: ${coverage.iteration}`, | ||
| `- Files: ${coverage.files.length}`, | ||
| `- Bugs: ${coverage.bugs.length}`, | ||
| `- Fix entries: ${coverage.fixes.length}`, | ||
| '', | ||
| '## Files' | ||
| ]; | ||
| if (coverage.files.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const entry of coverage.files) { | ||
| lines.push(`- ${entry.status} | ${entry.path}`); | ||
| } | ||
| } | ||
| lines.push('', '## Bugs'); | ||
| if (coverage.bugs.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const bug of coverage.bugs) { | ||
| lines.push(`- ${bug.bugId} | ${bug.severity} | ${bug.file} | ${bug.claim}`); | ||
| } | ||
| } | ||
| lines.push('', '## Fixes'); | ||
| if (coverage.fixes.length === 0) { | ||
| lines.push('- None'); | ||
| } else { | ||
| for (const fix of coverage.fixes) { | ||
| lines.push(`- ${fix.bugId} | ${fix.status}`); | ||
| } | ||
| } | ||
| return `${lines.join('\n')}\n`; | ||
| } | ||
| function validateFindingsArtifact(findingsJsonPath) { | ||
| if (!fs.existsSync(findingsJsonPath)) { | ||
| return { | ||
| ok: false, | ||
| errors: [`Missing findings artifact: ${findingsJsonPath}`] | ||
| }; | ||
| } | ||
| return validateArtifactFile({ | ||
| artifactName: 'findings', | ||
| filePath: findingsJsonPath | ||
| }); | ||
| } | ||
| function validateNamedArtifact({ artifactName, filePath }) { | ||
| if (!fs.existsSync(filePath)) { | ||
| return { | ||
| ok: false, | ||
| errors: [`Missing ${artifactName} artifact: ${filePath}`] | ||
| }; | ||
| } | ||
| return validateArtifactFile({ | ||
| artifactName, | ||
| filePath | ||
| }); | ||
| } | ||
| function removeFileIfExists(filePath) { | ||
| if (!filePath) { | ||
| return; | ||
| } | ||
| if (fs.existsSync(filePath)) { | ||
| fs.unlinkSync(filePath); | ||
| } | ||
| } | ||
| async function runPhase(options) { | ||
| const artifact = String(options.artifact || '').trim(); | ||
| if (!artifact) { | ||
| throw new Error('--artifact is required for phase command'); | ||
| } | ||
| if (!options['output-path']) { | ||
| throw new Error('--output-path is required for phase command'); | ||
| } | ||
| if (!options['worker-cmd']) { | ||
| throw new Error('--worker-cmd is required for phase command'); | ||
| } | ||
| const skillDir = resolveSkillDir(options); | ||
| const preflightResult = preflight(options); | ||
| if (!preflightResult.ok) { | ||
| throw new Error(`Missing helper scripts: ${preflightResult.missing.join(', ')}`); | ||
| } | ||
| const phaseName = options['phase-name'] || artifact; | ||
| const outputPath = path.resolve(options['output-path']); | ||
| const renderOutputPath = options['render-output-path'] | ||
| ? path.resolve(options['render-output-path']) | ||
| : null; | ||
| const workerCmdTemplate = options['worker-cmd']; | ||
| const renderCmdTemplate = options['render-cmd'] || null; | ||
| const timeoutMs = toPositiveInt(options['timeout-ms'], DEFAULT_TIMEOUT_MS); | ||
| const renderTimeoutMs = toPositiveInt(options['render-timeout-ms'], timeoutMs); | ||
| const maxRetries = toPositiveInt(options['max-retries'], DEFAULT_MAX_RETRIES); | ||
| const backoffMs = toPositiveInt(options['backoff-ms'], DEFAULT_BACKOFF_MS); | ||
| const journalPath = path.resolve( | ||
| options['journal-path'] || path.join(path.dirname(outputPath), `${phaseName}.log`) | ||
| ); | ||
| const templateVariables = { | ||
| artifact, | ||
| outputPath, | ||
| outputFilePath: outputPath, | ||
| renderOutputPath: renderOutputPath || '', | ||
| journalPath, | ||
| phaseName, | ||
| skillDir | ||
| }; | ||
| ensureDir(path.dirname(outputPath)); | ||
| if (renderOutputPath) { | ||
| ensureDir(path.dirname(renderOutputPath)); | ||
| } | ||
| removeFileIfExists(outputPath); | ||
| removeFileIfExists(renderOutputPath); | ||
| appendJournal(journalPath, { | ||
| event: 'phase-start', | ||
| artifact, | ||
| phase: phaseName, | ||
| outputPath, | ||
| renderOutputPath | ||
| }); | ||
| const workerCommand = fillTemplate(workerCmdTemplate, templateVariables); | ||
| const runResult = await runWithRetry({ | ||
| command: workerCommand, | ||
| timeoutMs, | ||
| maxRetries, | ||
| backoffMs, | ||
| journalPath, | ||
| phase: phaseName, | ||
| chunkId: artifact, | ||
| beforeAttempt: async () => { | ||
| removeFileIfExists(outputPath); | ||
| removeFileIfExists(renderOutputPath); | ||
| }, | ||
| postAttempt: async () => { | ||
| const validation = validateNamedArtifact({ | ||
| artifactName: artifact, | ||
| filePath: outputPath | ||
| }); | ||
| if (validation.ok) { | ||
| return { ok: true }; | ||
| } | ||
| return { | ||
| ok: false, | ||
| errorMessage: validation.errors.join('; ') | ||
| }; | ||
| } | ||
| }); | ||
| if (!runResult.ok) { | ||
| const errorMessage = (runResult.result && runResult.result.stderr) || `${phaseName} failed`; | ||
| appendJournal(journalPath, { | ||
| event: 'phase-failed', | ||
| artifact, | ||
| phase: phaseName, | ||
| errorMessage: errorMessage.slice(0, 500) | ||
| }); | ||
| throw new Error(errorMessage); | ||
| } | ||
| if (renderCmdTemplate) { | ||
| const renderCommand = fillTemplate(renderCmdTemplate, templateVariables); | ||
| appendJournal(journalPath, { | ||
| event: 'phase-render-start', | ||
| artifact, | ||
| phase: phaseName, | ||
| renderOutputPath | ||
| }); | ||
| const renderResult = await runCommandOnce({ | ||
| command: renderCommand, | ||
| timeoutMs: renderTimeoutMs | ||
| }); | ||
| if (!renderResult.ok) { | ||
| const renderError = renderResult.stderr || renderResult.stdout || `${phaseName} render failed`; | ||
| appendJournal(journalPath, { | ||
| event: 'phase-render-failed', | ||
| artifact, | ||
| phase: phaseName, | ||
| errorMessage: renderError.slice(0, 500) | ||
| }); | ||
| throw new Error(renderError); | ||
| } | ||
| appendJournal(journalPath, { | ||
| event: 'phase-render-end', | ||
| artifact, | ||
| phase: phaseName, | ||
| renderOutputPath | ||
| }); | ||
| } | ||
| appendJournal(journalPath, { | ||
| event: 'phase-end', | ||
| artifact, | ||
| phase: phaseName, | ||
| attemptsUsed: runResult.attemptsUsed | ||
| }); | ||
| return { | ||
| ok: true, | ||
| artifact, | ||
| phase: phaseName, | ||
| outputPath, | ||
| renderOutputPath, | ||
| journalPath, | ||
| attemptsUsed: runResult.attemptsUsed | ||
| }; | ||
| } | ||
| function loadIndex(indexPath) { | ||
@@ -530,3 +1056,17 @@ if (!indexPath || !fs.existsSync(indexPath)) { | ||
| phase: 'chunk-worker', | ||
| chunkId: chunk.id | ||
| chunkId: chunk.id, | ||
| beforeAttempt: async () => { | ||
| removeFileIfExists(findingsJsonPath); | ||
| removeFileIfExists(factsJsonPath); | ||
| }, | ||
| postAttempt: async () => { | ||
| const findingsValidation = validateFindingsArtifact(findingsJsonPath); | ||
| if (findingsValidation.ok) { | ||
| return { ok: true }; | ||
| } | ||
| return { | ||
| ok: false, | ||
| errorMessage: findingsValidation.errors.join('; ') | ||
| }; | ||
| } | ||
| }); | ||
@@ -549,6 +1089,4 @@ | ||
| let findings = []; | ||
| if (fs.existsSync(findingsJsonPath)) { | ||
| runJsonScript(stateScript, ['record-findings', statePath, findingsJsonPath, 'orchestrator']); | ||
| findings = readJson(findingsJsonPath); | ||
| } | ||
| runJsonScript(stateScript, ['record-findings', statePath, findingsJsonPath, 'orchestrator']); | ||
| findings = readJson(findingsJsonPath); | ||
@@ -681,2 +1219,6 @@ if (fs.existsSync(factsJsonPath)) { | ||
| const fixPlanPath = path.resolve(options['fix-plan-path'] || path.join(path.dirname(statePath), 'fix-plan.json')); | ||
| const strategyPath = path.resolve(options['strategy-path'] || path.join(path.dirname(statePath), 'fix-strategy.json')); | ||
| const strategyMarkdownPath = path.resolve(options['strategy-markdown-path'] || path.join(path.dirname(statePath), 'fix-strategy.md')); | ||
| const coveragePath = path.resolve(options['coverage-path'] || path.join(path.dirname(statePath), 'coverage.json')); | ||
| const coverageMarkdownPath = path.resolve(options['coverage-markdown-path'] || path.join(path.dirname(statePath), 'coverage.md')); | ||
| const factsPath = path.resolve(options['facts-path'] || path.join(path.dirname(statePath), 'bug-hunter-facts.json')); | ||
@@ -729,3 +1271,3 @@ ensureDir(chunksDir); | ||
| .filter((entry) => { | ||
| return entry.confidence === null || entry.confidence === undefined || Number(entry.confidence) < confidenceThreshold; | ||
| return entry.confidenceScore === null || entry.confidenceScore === undefined || Number(entry.confidenceScore) < confidenceThreshold; | ||
| }) | ||
@@ -794,10 +1336,92 @@ .map((entry) => entry.file)); | ||
| const hasOpenOrFailedChunks = (status.summary.chunkStatus.pending || 0) > 0 | ||
| || (status.summary.chunkStatus.inProgress || 0) > 0 | ||
| || (status.summary.chunkStatus.failed || 0) > 0; | ||
| if (hasOpenOrFailedChunks) { | ||
| appendJournal(journalPath, { | ||
| event: 'fix-planning-skipped', | ||
| reason: 'incomplete-or-failed-chunks', | ||
| chunkStatus: status.summary.chunkStatus | ||
| }); | ||
| return { | ||
| ok: true, | ||
| backend, | ||
| journalPath, | ||
| statePath, | ||
| indexPath: scope.indexPath, | ||
| deltaMode: scope.deltaMode, | ||
| deltaSummary: scope.deltaResult ? { | ||
| selectedCount: (scope.deltaResult.selected || []).length, | ||
| expansionCandidatesCount: (scope.deltaResult.expansionCandidates || []).length | ||
| } : null, | ||
| consistencyReportPath, | ||
| strategyPath: null, | ||
| strategyMarkdownPath: null, | ||
| fixPlanPath: null, | ||
| coveragePath: null, | ||
| coverageMarkdownPath: null, | ||
| factsPath, | ||
| status: status.summary, | ||
| consistency: { | ||
| conflicts: consistency.conflicts.length, | ||
| lowConfidenceFindings: consistency.lowConfidenceFindings | ||
| }, | ||
| fixStrategy: null, | ||
| fixPlan: null | ||
| }; | ||
| } | ||
| const fixStrategy = buildFixStrategy({ | ||
| bugLedger: toArray(finalState.bugLedger), | ||
| confidenceThreshold, | ||
| consistency | ||
| }); | ||
| const fixStrategyValidation = validateArtifactValue({ | ||
| artifactName: 'fix-strategy', | ||
| value: fixStrategy | ||
| }); | ||
| if (!fixStrategyValidation.ok) { | ||
| throw new Error(`Generated invalid fix strategy artifact: ${fixStrategyValidation.errors.join('; ')}`); | ||
| } | ||
| writeJson(strategyPath, fixStrategy); | ||
| ensureDir(path.dirname(strategyMarkdownPath)); | ||
| fs.writeFileSync( | ||
| strategyMarkdownPath, | ||
| runTextScript(path.join(skillDir, 'scripts', 'render-report.cjs'), ['fix-strategy', strategyPath]), | ||
| 'utf8' | ||
| ); | ||
| const fixPlan = buildFixPlan({ | ||
| bugLedger: toArray(finalState.bugLedger), | ||
| confidenceThreshold, | ||
| canarySize | ||
| canarySize, | ||
| consistency | ||
| }); | ||
| const fixPlanValidation = validateArtifactValue({ | ||
| artifactName: 'fix-plan', | ||
| value: fixPlan | ||
| }); | ||
| if (!fixPlanValidation.ok) { | ||
| throw new Error(`Generated invalid fix plan artifact: ${fixPlanValidation.errors.join('; ')}`); | ||
| } | ||
| writeJson(fixPlanPath, fixPlan); | ||
| runJsonScript(stateScript, ['set-fix-plan', statePath, fixPlanPath]); | ||
| const coverage = buildCoverageArtifact({ | ||
| state: finalState, | ||
| fixPlan | ||
| }); | ||
| const coverageValidation = validateArtifactValue({ | ||
| artifactName: 'coverage', | ||
| value: coverage | ||
| }); | ||
| if (!coverageValidation.ok) { | ||
| throw new Error(`Generated invalid coverage artifact: ${coverageValidation.errors.join('; ')}`); | ||
| } | ||
| writeJson(coveragePath, coverage); | ||
| ensureDir(path.dirname(coverageMarkdownPath)); | ||
| fs.writeFileSync(coverageMarkdownPath, renderCoverageMarkdown(coverage), 'utf8'); | ||
| writeJson(factsPath, finalState.factCards || {}); | ||
@@ -824,3 +1448,7 @@ | ||
| consistencyReportPath, | ||
| strategyPath, | ||
| strategyMarkdownPath, | ||
| fixPlanPath, | ||
| coveragePath, | ||
| coverageMarkdownPath, | ||
| factsPath, | ||
@@ -832,2 +1460,3 @@ status: status.summary, | ||
| }, | ||
| fixStrategy: fixStrategy.summary, | ||
| fixPlan: fixPlan.totals | ||
@@ -859,2 +1488,8 @@ }; | ||
| if (command === 'phase') { | ||
| const result = await runPhase(options); | ||
| console.log(JSON.stringify(result, null, 2)); | ||
| return; | ||
| } | ||
| if (command === 'plan') { | ||
@@ -861,0 +1496,0 @@ if (!options['files-json']) { |
@@ -52,4 +52,28 @@ const assert = require('node:assert/strict'); | ||
| writeJson(findingsJson, [ | ||
| { bugId: 'BUG-1', severity: 'Low', file: 'src/x.ts', lines: '1', claim: 'x' }, | ||
| { bugId: 'BUG-2', severity: 'Critical', file: 'src/x.ts', lines: '1', claim: 'x' } | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'Low', | ||
| category: 'logic', | ||
| file: 'src/x.ts', | ||
| lines: '1', | ||
| claim: 'x', | ||
| evidence: 'src/x.ts:1 first evidence', | ||
| runtimeTrigger: 'Call x()', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 40 | ||
| }, | ||
| { | ||
| bugId: 'BUG-2', | ||
| severity: 'Critical', | ||
| category: 'security', | ||
| file: 'src/x.ts', | ||
| lines: '1', | ||
| claim: 'x', | ||
| evidence: 'src/x.ts:1 upgraded evidence', | ||
| runtimeTrigger: 'Call x() with attacker-controlled input', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 95, | ||
| stride: 'Tampering', | ||
| cwe: 'CWE-20' | ||
| } | ||
| ]); | ||
@@ -68,3 +92,4 @@ const recorded = runJson('node', [stateScript, 'record-findings', statePath, findingsJson, 'test']); | ||
| assert.equal(state.bugLedger[0].severity, 'Critical'); | ||
| assert.equal(state.metrics.lowConfidenceFindings, 1); | ||
| assert.equal(state.bugLedger[0].confidenceScore, 95); | ||
| assert.equal(state.metrics.lowConfidenceFindings, 0); | ||
@@ -90,1 +115,41 @@ const extraFile = path.join(sandbox, 'c.ts'); | ||
| }); | ||
| test('bug-hunter-state rejects malformed findings artifacts', () => { | ||
| const sandbox = makeSandbox('bug-hunter-state-invalid-'); | ||
| const stateScript = resolveSkillScript('bug-hunter-state.cjs'); | ||
| const filePath = path.join(sandbox, 'a.ts'); | ||
| fs.writeFileSync(filePath, 'const a = 1;\n', 'utf8'); | ||
| const filesJson = path.join(sandbox, 'files.json'); | ||
| writeJson(filesJson, [filePath]); | ||
| const statePath = path.join(sandbox, 'state.json'); | ||
| runJson('node', [stateScript, 'init', statePath, 'extended', filesJson, '1']); | ||
| const findingsJson = path.join(sandbox, 'findings.json'); | ||
| writeJson(findingsJson, [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'Low', | ||
| category: 'logic', | ||
| file: 'src/x.ts', | ||
| lines: '1', | ||
| evidence: 'src/x.ts:1 evidence', | ||
| runtimeTrigger: 'Call x()', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 40 | ||
| } | ||
| ]); | ||
| const result = require('node:child_process').spawnSync('node', [ | ||
| stateScript, | ||
| 'record-findings', | ||
| statePath, | ||
| findingsJson, | ||
| 'test' | ||
| ], { | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.notEqual(result.status, 0); | ||
| assert.match(`${result.stderr}${result.stdout}`, /Invalid findings artifact/); | ||
| }); |
@@ -58,1 +58,16 @@ const assert = require('node:assert/strict'); | ||
| }); | ||
| test('code-index query-bugs cleans up temp seed files after failures', () => { | ||
| const sandbox = makeSandbox('code-index-query-bugs-'); | ||
| const codeIndex = resolveSkillScript('code-index.cjs'); | ||
| const bugsJson = path.join(sandbox, 'bugs.json'); | ||
| const missingIndexPath = path.join(sandbox, 'missing-index.json'); | ||
| const sourceFile = path.join(sandbox, 'src', 'feature.ts'); | ||
| fs.mkdirSync(path.dirname(sourceFile), { recursive: true }); | ||
| fs.writeFileSync(sourceFile, 'export const feature = true;\n', 'utf8'); | ||
| writeJson(bugsJson, [{ bugId: 'BUG-1', file: sourceFile }]); | ||
| const result = require('./test-utils.cjs').runRaw('node', [codeIndex, 'query-bugs', missingIndexPath, bugsJson, '1']); | ||
| assert.notEqual(result.status, 0); | ||
| assert.equal(fs.existsSync(path.join(sandbox, '.seed-files.tmp.json')), false); | ||
| }); |
@@ -12,3 +12,3 @@ const assert = require('node:assert/strict'); | ||
| test('fix-lock enforces single writer and supports release', () => { | ||
| test('fix-lock enforces single writer and supports token-protected release', () => { | ||
| const sandbox = makeSandbox('fix-lock-'); | ||
@@ -21,2 +21,4 @@ const lockScript = resolveSkillScript('fix-lock.cjs'); | ||
| assert.equal(acquire1.acquired, true); | ||
| assert.equal(typeof acquire1.lock.ownerToken, 'string'); | ||
| assert.equal(acquire1.lock.ownerToken.length > 8, true); | ||
@@ -28,2 +30,10 @@ const acquire2 = runRaw('node', [lockScript, 'acquire', lockPath, '120']); | ||
| const renew = runJson('node', [lockScript, 'renew', lockPath, acquire1.lock.ownerToken]); | ||
| assert.equal(renew.ok, true); | ||
| assert.equal(renew.renewed, true); | ||
| const badRelease = runRaw('node', [lockScript, 'release', lockPath, 'wrong-token']); | ||
| assert.notEqual(badRelease.status, 0); | ||
| assert.match(`${badRelease.stdout || ''}${badRelease.stderr || ''}`, /lock-owner-mismatch/); | ||
| const status = runJson('node', [lockScript, 'status', lockPath, '120']); | ||
@@ -33,3 +43,3 @@ assert.equal(status.exists, true); | ||
| const release = runJson('node', [lockScript, 'release', lockPath]); | ||
| const release = runJson('node', [lockScript, 'release', lockPath, acquire1.lock.ownerToken]); | ||
| assert.equal(release.ok, true); | ||
@@ -41,1 +51,49 @@ assert.equal(release.released, true); | ||
| }); | ||
| test('fix-lock does not steal an expired lock from a still-running owner', () => { | ||
| const sandbox = makeSandbox('fix-lock-live-owner-'); | ||
| const lockScript = resolveSkillScript('fix-lock.cjs'); | ||
| const lockPath = path.join(sandbox, 'bug-hunter-fix.lock'); | ||
| require('fs').writeFileSync(lockPath, `${JSON.stringify({ | ||
| pid: process.pid, | ||
| host: 'test-host', | ||
| cwd: sandbox, | ||
| createdAtMs: Date.now() - 10_000, | ||
| createdAt: new Date(Date.now() - 10_000).toISOString(), | ||
| ownerToken: 'existing-owner-token' | ||
| }, null, 2)}\n`, 'utf8'); | ||
| const acquire = runRaw('node', [lockScript, 'acquire', lockPath, '1']); | ||
| assert.notEqual(acquire.status, 0); | ||
| assert.match(`${acquire.stdout || ''}${acquire.stderr || ''}`, /lock-held-by-live-owner|lock-held/); | ||
| }); | ||
| test('fix-lock acquires atomically under contention', async () => { | ||
| const sandbox = makeSandbox('fix-lock-race-'); | ||
| const lockScript = resolveSkillScript('fix-lock.cjs'); | ||
| const lockPath = path.join(sandbox, 'bug-hunter-fix.lock'); | ||
| const results = await Promise.all(Array.from({ length: 20 }, () => { | ||
| return new Promise((resolve) => { | ||
| const child = require('node:child_process').spawn('node', [lockScript, 'acquire', lockPath, '120'], { | ||
| stdio: ['ignore', 'pipe', 'pipe'] | ||
| }); | ||
| child.on('close', (code) => resolve(code)); | ||
| }); | ||
| })); | ||
| const successCount = results.filter((code) => code === 0).length; | ||
| assert.equal(successCount, 1); | ||
| }); | ||
| test('fix-lock recovers from a corrupted lock file', () => { | ||
| const sandbox = makeSandbox('fix-lock-corrupt-'); | ||
| const lockScript = resolveSkillScript('fix-lock.cjs'); | ||
| const lockPath = path.join(sandbox, 'bug-hunter-fix.lock'); | ||
| require('fs').writeFileSync(lockPath, '{broken json', 'utf8'); | ||
| const result = runJson('node', [lockScript, 'acquire', lockPath, '120']); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.acquired, true); | ||
| }); |
@@ -57,5 +57,10 @@ #!/usr/bin/env node | ||
| severity: 'Medium', | ||
| category: 'logic', | ||
| file: `src/retry-${chunkId}.ts`, | ||
| lines: '10-11', | ||
| claim: `retry-success-${chunkId}` | ||
| claim: `retry-success-${chunkId}`, | ||
| evidence: `src/retry-${chunkId}.ts:10-11 retry success evidence`, | ||
| runtimeTrigger: `Retry attempt for ${chunkId}`, | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 88 | ||
| } | ||
@@ -62,0 +67,0 @@ ]; |
@@ -57,6 +57,12 @@ #!/usr/bin/env node | ||
| severity: 'Critical', | ||
| confidence: Number.isInteger(confidence) ? confidence : 60, | ||
| category: 'security', | ||
| file: scanFiles[0], | ||
| lines: '1', | ||
| claim: `Low-confidence risk in ${path.basename(scanFiles[0])}` | ||
| claim: `Low-confidence risk in ${path.basename(scanFiles[0])}`, | ||
| evidence: `${scanFiles[0]}:1 fixture evidence`, | ||
| runtimeTrigger: `Load ${path.basename(scanFiles[0])} through the low-confidence worker`, | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: Number.isInteger(confidence) ? confidence : 60, | ||
| stride: 'Tampering', | ||
| cwe: 'CWE-20' | ||
| } | ||
@@ -63,0 +69,0 @@ ]); |
@@ -37,7 +37,12 @@ #!/usr/bin/env node | ||
| severity: 'Low', | ||
| category: 'logic', | ||
| file: 'src/example.ts', | ||
| lines: '1', | ||
| claim: 'example' | ||
| claim: 'example', | ||
| evidence: 'src/example.ts:1 example evidence', | ||
| runtimeTrigger: 'Run the success worker fixture', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 80 | ||
| } | ||
| ]; | ||
| fs.writeFileSync(findingsJson, `${JSON.stringify(payload, null, 2)}\n`, 'utf8'); |
@@ -16,2 +16,3 @@ const assert = require('node:assert/strict'); | ||
| const guardScript = resolveSkillScript('payload-guard.cjs'); | ||
| const schemaRuntime = require(resolveSkillScript('schema-runtime.cjs')); | ||
| const validPayloadPath = path.join(sandbox, 'valid.json'); | ||
@@ -25,3 +26,3 @@ const invalidPayloadPath = path.join(sandbox, 'invalid.json'); | ||
| techStack: { framework: 'express' }, | ||
| outputSchema: { type: 'object' } | ||
| outputSchema: schemaRuntime.createSchemaRef('findings') | ||
| }); | ||
@@ -36,3 +37,3 @@ | ||
| targetFiles: [], | ||
| outputSchema: null | ||
| outputSchema: { artifact: 'findings', schemaVersion: 999, schemaFile: 'schemas/findings.schema.json' } | ||
| }); | ||
@@ -44,2 +45,153 @@ | ||
| assert.match(output, /Missing required field: riskMap/); | ||
| assert.match(output, /schema version 1/); | ||
| }); | ||
| test('schema-validate validates example findings fixtures', () => { | ||
| const validatorScript = resolveSkillScript('schema-validate.cjs'); | ||
| const validPath = resolveSkillScript('..', 'schemas', 'examples', 'findings.valid.json'); | ||
| const invalidPath = resolveSkillScript('..', 'schemas', 'examples', 'findings.invalid.json'); | ||
| const valid = runJson('node', [validatorScript, 'findings', validPath]); | ||
| assert.equal(valid.ok, true); | ||
| const invalid = runRaw('node', [validatorScript, 'findings', invalidPath]); | ||
| assert.notEqual(invalid.status, 0); | ||
| assert.match(`${invalid.stdout}${invalid.stderr}`, /\$\[0\]\.claim is required/); | ||
| }); | ||
| test('schema-validate accepts valid skeptic, referee, fix-report, fix-strategy, and fix-plan artifacts', () => { | ||
| const sandbox = makeSandbox('schema-validate-more-'); | ||
| const validatorScript = resolveSkillScript('schema-validate.cjs'); | ||
| const skepticPath = path.join(sandbox, 'skeptic.json'); | ||
| const refereePath = path.join(sandbox, 'referee.json'); | ||
| const fixReportPath = path.join(sandbox, 'fix-report.json'); | ||
| const fixStrategyPath = path.join(sandbox, 'fix-strategy.json'); | ||
| const fixPlanPath = path.join(sandbox, 'fix-plan.json'); | ||
| writeJson(skepticPath, [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| response: 'ACCEPT', | ||
| analysisSummary: 'The finding holds after re-reading the code.' | ||
| } | ||
| ]); | ||
| writeJson(refereePath, [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| verdict: 'REAL_BUG', | ||
| trueSeverity: 'Critical', | ||
| confidenceScore: 95, | ||
| confidenceLabel: 'high', | ||
| verificationMode: 'INDEPENDENTLY_VERIFIED', | ||
| analysisSummary: 'Confirmed by direct code trace.' | ||
| } | ||
| ]); | ||
| writeJson(fixReportPath, { | ||
| version: '3.0.0', | ||
| fix_branch: 'bug-hunter-fix-20260311-200000', | ||
| base_commit: 'abc123', | ||
| dry_run: false, | ||
| circuit_breaker_tripped: false, | ||
| phase2_timeout_hit: false, | ||
| fixes: [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'CRITICAL', | ||
| status: 'FIXED', | ||
| files: ['src/a.ts'], | ||
| lines: '10-12', | ||
| commit: 'def456', | ||
| description: 'Parameterized the query.' | ||
| } | ||
| ], | ||
| verification: { | ||
| baseline_pass: 10, | ||
| baseline_fail: 1, | ||
| flaky_tests: 0, | ||
| final_pass: 11, | ||
| final_fail: 0, | ||
| new_failures: 0, | ||
| resolved_failures: 1, | ||
| typecheck_pass: true, | ||
| build_pass: true, | ||
| fixer_bugs_found: 0 | ||
| }, | ||
| summary: { | ||
| total_confirmed: 1, | ||
| eligible: 1, | ||
| manual_review: 0, | ||
| fixed: 1, | ||
| fix_reverted: 0, | ||
| fix_failed: 0, | ||
| skipped: 0, | ||
| fixer_bug: 0, | ||
| partial: 0 | ||
| } | ||
| }); | ||
| writeJson(fixStrategyPath, { | ||
| version: '3.1.0', | ||
| generatedAt: '2026-03-12T00:00:00.000Z', | ||
| confidenceThreshold: 75, | ||
| summary: { | ||
| confirmed: 1, | ||
| safeAutofix: 1, | ||
| manualReview: 0, | ||
| largerRefactor: 0, | ||
| architecturalRemediation: 0, | ||
| canaryCandidates: 1, | ||
| rolloutCandidates: 0 | ||
| }, | ||
| clusters: [ | ||
| { | ||
| clusterId: 'cluster-1', | ||
| strategy: 'safe-autofix', | ||
| executionStage: 'canary', | ||
| autofixEligible: true, | ||
| bugIds: ['BUG-1'], | ||
| files: ['src/a.ts'], | ||
| maxSeverity: 'CRITICAL', | ||
| summary: '1 bug(s) in src classified as safe-autofix.', | ||
| recommendedAction: 'Proceed through the guarded fix pipeline with canary verification and rollback safety.', | ||
| reasons: ['Finding is localized enough for a guarded surgical fix.'] | ||
| } | ||
| ] | ||
| }); | ||
| writeJson(fixPlanPath, { | ||
| generatedAt: '2026-03-12T00:00:00.000Z', | ||
| confidenceThreshold: 75, | ||
| canarySize: 1, | ||
| totals: { | ||
| findings: 1, | ||
| eligible: 1, | ||
| canary: 1, | ||
| rollout: 0, | ||
| manualReview: 0 | ||
| }, | ||
| canary: [ | ||
| { | ||
| bugId: 'BUG-1', | ||
| severity: 'Critical', | ||
| category: 'logic', | ||
| file: 'src/a.ts', | ||
| lines: '10-12', | ||
| claim: 'x', | ||
| evidence: 'src/a.ts:10-12 evidence', | ||
| runtimeTrigger: 'Call x()', | ||
| crossReferences: ['Single file'], | ||
| confidenceScore: 95, | ||
| strategy: 'safe-autofix', | ||
| executionStage: 'canary', | ||
| autofixEligible: true, | ||
| reason: 'Finding is localized enough for a guarded surgical fix.' | ||
| } | ||
| ], | ||
| rollout: [], | ||
| manualReview: [] | ||
| }); | ||
| assert.equal(runJson('node', [validatorScript, 'skeptic', skepticPath]).ok, true); | ||
| assert.equal(runJson('node', [validatorScript, 'referee', refereePath]).ok, true); | ||
| assert.equal(runJson('node', [validatorScript, 'fix-report', fixReportPath]).ok, true); | ||
| assert.equal(runJson('node', [validatorScript, 'fix-strategy', fixStrategyPath]).ok, true); | ||
| assert.equal(runJson('node', [validatorScript, 'fix-plan', fixPlanPath]).ok, true); | ||
| }); |
@@ -11,2 +11,3 @@ const assert = require('node:assert/strict'); | ||
| runJson, | ||
| runRaw, | ||
| writeJson | ||
@@ -35,3 +36,5 @@ } = require('./test-utils.cjs'); | ||
| const scriptsDir = path.join(optionalSkillDir, 'scripts'); | ||
| const schemasDir = path.join(optionalSkillDir, 'schemas'); | ||
| fs.mkdirSync(scriptsDir, { recursive: true }); | ||
| fs.mkdirSync(schemasDir, { recursive: true }); | ||
@@ -42,9 +45,29 @@ for (const fileName of [ | ||
| 'payload-guard.cjs', | ||
| 'schema-validate.cjs', | ||
| 'schema-runtime.cjs', | ||
| 'render-report.cjs', | ||
| 'fix-lock.cjs', | ||
| 'doc-lookup.cjs', | ||
| 'context7-api.cjs', | ||
| 'delta-mode.cjs' | ||
| 'delta-mode.cjs', | ||
| 'pr-scope.cjs' | ||
| ]) { | ||
| fs.copyFileSync(resolveSkillScript(fileName), path.join(scriptsDir, fileName)); | ||
| } | ||
| for (const fileName of [ | ||
| 'findings.schema.json', | ||
| 'skeptic.schema.json', | ||
| 'referee.schema.json', | ||
| 'coverage.schema.json', | ||
| 'fix-report.schema.json', | ||
| 'fix-plan.schema.json', | ||
| 'fix-strategy.schema.json', | ||
| 'recon.schema.json', | ||
| 'shared.schema.json' | ||
| ]) { | ||
| fs.copyFileSync( | ||
| resolveSkillScript('..', 'schemas', fileName), | ||
| path.join(schemasDir, fileName) | ||
| ); | ||
| } | ||
@@ -157,3 +180,7 @@ const result = runJson('node', [ | ||
| const fixPlanPath = path.join(sandbox, '.claude', 'bug-hunter-fix-plan.json'); | ||
| const strategyPath = path.join(sandbox, '.claude', 'bug-hunter-fix-strategy.json'); | ||
| const strategyMarkdownPath = path.join(sandbox, '.claude', 'bug-hunter-fix-strategy.md'); | ||
| const factsPath = path.join(sandbox, '.claude', 'bug-hunter-facts.json'); | ||
| const coveragePath = path.join(sandbox, '.claude', 'coverage.json'); | ||
| const coverageMarkdownPath = path.join(sandbox, '.claude', 'coverage.md'); | ||
@@ -219,2 +246,6 @@ const changedFile = path.join(sandbox, 'src', 'feature', 'changed.ts'); | ||
| fixPlanPath, | ||
| '--strategy-path', | ||
| strategyPath, | ||
| '--strategy-markdown-path', | ||
| strategyMarkdownPath, | ||
| '--facts-path', | ||
@@ -243,3 +274,7 @@ factsPath, | ||
| assert.equal(fs.existsSync(fixPlanPath), true); | ||
| assert.equal(fs.existsSync(strategyPath), true); | ||
| assert.equal(fs.existsSync(strategyMarkdownPath), true); | ||
| assert.equal(fs.existsSync(factsPath), true); | ||
| assert.equal(fs.existsSync(coveragePath), true); | ||
| assert.equal(fs.existsSync(coverageMarkdownPath), true); | ||
@@ -252,2 +287,3 @@ const seenFiles = readJson(seenFilesPath); | ||
| assert.equal(state.metrics.lowConfidenceFindings >= 1, true); | ||
| assert.equal(state.bugLedger.every((entry) => typeof entry.confidenceScore === 'number'), true); | ||
@@ -259,2 +295,14 @@ const consistency = readJson(consistencyReportPath); | ||
| assert.equal(fixPlan.totals.manualReview >= 1, true); | ||
| const fixStrategy = readJson(strategyPath); | ||
| assert.equal(fixStrategy.summary.manualReview >= 1, true); | ||
| assert.equal(Array.isArray(fixStrategy.clusters), true); | ||
| const strategyMarkdown = fs.readFileSync(strategyMarkdownPath, 'utf8'); | ||
| assert.match(strategyMarkdown, /# Fix Strategy/); | ||
| const coverage = readJson(coveragePath); | ||
| assert.equal(coverage.status, 'COMPLETE'); | ||
| assert.equal(Array.isArray(coverage.files), true); | ||
| assert.equal(Array.isArray(coverage.bugs), true); | ||
| }); | ||
@@ -325,2 +373,122 @@ | ||
| test('run-bug-hunter excludes non-autofix strategy findings from the executable fix plan', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-strategy-gate-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const fixPlanPath = path.join(sandbox, '.claude', 'bug-hunter-fix-plan.json'); | ||
| const workerPath = path.join(sandbox, 'worker.cjs'); | ||
| const fileA = path.join(sandbox, 'src', 'architecture.ts'); | ||
| fs.mkdirSync(path.dirname(fileA), { recursive: true }); | ||
| fs.writeFileSync(fileA, 'export const architecture = true;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [fileA]); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const findingsPath = process.argv[process.argv.indexOf('--findings-json') + 1];", | ||
| "const scanPath = process.argv[process.argv.indexOf('--scan-files-json') + 1];", | ||
| "const scanFiles = JSON.parse(fs.readFileSync(scanPath, 'utf8'));", | ||
| "fs.writeFileSync(findingsPath, JSON.stringify([{ bugId: 'BUG-ARCH', severity: 'Critical', category: 'logic', file: scanFiles[0], lines: '1', claim: 'architecture contract violation in orchestration flow', evidence: scanFiles[0] + ':1 architecture evidence', runtimeTrigger: 'Run the orchestrator on this file', crossReferences: ['Single file'], confidenceScore: 98, confidenceLabel: 'high', stride: 'N/A', cwe: 'N/A' }], null, 2));" | ||
| ].join('\n'), 'utf8'); | ||
| runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--mode', | ||
| 'extended', | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| `node ${workerPath} --chunk-id {chunkId} --scan-files-json {scanFilesJson} --findings-json {findingsJson}`, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--confidence-threshold', | ||
| '75', | ||
| '--fix-plan-path', | ||
| fixPlanPath, | ||
| '--canary-size', | ||
| '1' | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| const fixPlan = readJson(fixPlanPath); | ||
| assert.equal(fixPlan.totals.eligible, 0); | ||
| assert.equal(fixPlan.totals.canary, 0); | ||
| assert.equal(fixPlan.totals.rollout, 0); | ||
| }); | ||
| test('run-bug-hunter downgrades conflicting findings to manual review before fix-plan execution', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-conflicts-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const fixPlanPath = path.join(sandbox, '.claude', 'bug-hunter-fix-plan.json'); | ||
| const consistencyPath = path.join(sandbox, '.claude', 'consistency.json'); | ||
| const workerPath = path.join(sandbox, 'worker.cjs'); | ||
| const fileA = path.join(sandbox, 'src', 'conflict.ts'); | ||
| fs.mkdirSync(path.dirname(fileA), { recursive: true }); | ||
| fs.writeFileSync(fileA, 'export const conflict = true;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [fileA]); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const findingsPath = process.argv[process.argv.indexOf('--findings-json') + 1];", | ||
| "const scanPath = process.argv[process.argv.indexOf('--scan-files-json') + 1];", | ||
| "const scanFiles = JSON.parse(fs.readFileSync(scanPath, 'utf8'));", | ||
| "fs.writeFileSync(findingsPath, JSON.stringify([", | ||
| " { bugId: 'BUG-1', severity: 'Critical', category: 'logic', file: scanFiles[0], lines: '1', claim: 'first conflicting claim', evidence: scanFiles[0] + ':1 first', runtimeTrigger: 'Trigger first', crossReferences: ['Single file'], confidenceScore: 97, confidenceLabel: 'high', stride: 'N/A', cwe: 'N/A' },", | ||
| " { bugId: 'BUG-2', severity: 'Critical', category: 'logic', file: scanFiles[0], lines: '1', claim: 'second conflicting claim', evidence: scanFiles[0] + ':1 second', runtimeTrigger: 'Trigger second', crossReferences: ['Single file'], confidenceScore: 96, confidenceLabel: 'high', stride: 'N/A', cwe: 'N/A' }", | ||
| "], null, 2));" | ||
| ].join('\n'), 'utf8'); | ||
| runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--mode', | ||
| 'extended', | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| `node ${workerPath} --chunk-id {chunkId} --scan-files-json {scanFilesJson} --findings-json {findingsJson}`, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--confidence-threshold', | ||
| '75', | ||
| '--fix-plan-path', | ||
| fixPlanPath, | ||
| '--consistency-report', | ||
| consistencyPath, | ||
| '--canary-size', | ||
| '1' | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| const consistency = readJson(consistencyPath); | ||
| assert.equal(consistency.conflicts.length >= 1, true); | ||
| const fixPlan = readJson(fixPlanPath); | ||
| assert.equal(fixPlan.totals.eligible, 0); | ||
| assert.equal(fixPlan.totals.manualReview, 2); | ||
| }); | ||
| test('run-bug-hunter respects configured delta hops during low-confidence expansion', () => { | ||
@@ -359,3 +527,3 @@ const sandbox = makeSandbox('run-bug-hunter-delta-hops-'); | ||
| "fs.writeFileSync(seenPath, JSON.stringify(seen));", | ||
| "const findings = scan[0] === changedPath ? [{ file: scan[0], lines: '1', claim: 'low confidence', severity: 'Low', confidence: 60 }] : [];", | ||
| "const findings = scan[0] === changedPath ? [{ bugId: 'BUG-inline', severity: 'Low', category: 'logic', file: scan[0], lines: '1', claim: 'low confidence', evidence: scan[0] + ':1 inline evidence', runtimeTrigger: 'Load the changed file', crossReferences: ['Single file'], confidenceScore: 60 }] : [];", | ||
| "fs.writeFileSync(findingsPath, JSON.stringify(findings));" | ||
@@ -414,1 +582,517 @@ ].join('\n'), 'utf8'); | ||
| }); | ||
| test('run-bug-hunter retries malformed findings and records schema errors in the journal', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-invalid-findings-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const journalPath = path.join(sandbox, '.claude', 'bug-hunter-run.log'); | ||
| const attemptsFile = path.join(sandbox, 'attempts.json'); | ||
| const sourceFile = path.join(sandbox, 'src', 'a.ts'); | ||
| fs.mkdirSync(path.dirname(sourceFile), { recursive: true }); | ||
| fs.writeFileSync(sourceFile, 'export const a = 1;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [sourceFile]); | ||
| const workerPath = path.join(sandbox, 'invalid-then-valid-worker.cjs'); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const path = require('path');", | ||
| "const args = process.argv;", | ||
| "const chunkId = args[args.indexOf('--chunk-id') + 1];", | ||
| "const findingsPath = args[args.indexOf('--findings-json') + 1];", | ||
| "const attemptsPath = args[args.indexOf('--attempts-file') + 1];", | ||
| 'let attempts = {};', | ||
| "if (fs.existsSync(attemptsPath)) attempts = JSON.parse(fs.readFileSync(attemptsPath, 'utf8'));", | ||
| "attempts[chunkId] = (attempts[chunkId] || 0) + 1;", | ||
| "fs.mkdirSync(path.dirname(attemptsPath), { recursive: true });", | ||
| "fs.writeFileSync(attemptsPath, JSON.stringify(attempts, null, 2));", | ||
| "const payload = attempts[chunkId] === 1", | ||
| " ? [{ bugId: 'BUG-1', severity: 'Low', category: 'logic', file: 'src/a.ts', lines: '1', evidence: 'src/a.ts:1 evidence', runtimeTrigger: 'Call a()', crossReferences: ['Single file'], confidenceScore: 60 }]", | ||
| " : [{ bugId: 'BUG-1', severity: 'Low', category: 'logic', file: 'src/a.ts', lines: '1', claim: 'valid after retry', evidence: 'src/a.ts:1 evidence', runtimeTrigger: 'Call a()', crossReferences: ['Single file'], confidenceScore: 60 }];", | ||
| "fs.writeFileSync(findingsPath, JSON.stringify(payload, null, 2));" | ||
| ].join('\n'), 'utf8'); | ||
| const workerTemplate = [ | ||
| 'node', | ||
| workerPath, | ||
| '--chunk-id', | ||
| '{chunkId}', | ||
| '--findings-json', | ||
| '{findingsJson}', | ||
| '--attempts-file', | ||
| attemptsFile | ||
| ].join(' '); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| workerTemplate, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--max-retries', | ||
| '1', | ||
| '--backoff-ms', | ||
| '10', | ||
| '--journal-path', | ||
| journalPath | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| const attempts = readJson(attemptsFile); | ||
| assert.equal(attempts['chunk-1'], 2); | ||
| const journal = fs.readFileSync(journalPath, 'utf8'); | ||
| assert.match(journal, /attempt-post-check-failed/); | ||
| assert.match(journal, /\$\[0\]\.claim is required/); | ||
| }); | ||
| test('run-bug-hunter clears stale findings artifacts before retrying a chunk', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-stale-artifact-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const journalPath = path.join(sandbox, '.claude', 'bug-hunter-run.log'); | ||
| const attemptsFile = path.join(sandbox, 'attempts.json'); | ||
| const sourceFile = path.join(sandbox, 'src', 'a.ts'); | ||
| fs.mkdirSync(path.dirname(sourceFile), { recursive: true }); | ||
| fs.writeFileSync(sourceFile, 'export const a = 1;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [sourceFile]); | ||
| const workerPath = path.join(sandbox, 'stale-artifact-worker.cjs'); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const path = require('path');", | ||
| "const args = process.argv;", | ||
| "const chunkId = args[args.indexOf('--chunk-id') + 1];", | ||
| "const findingsPath = args[args.indexOf('--findings-json') + 1];", | ||
| "const attemptsPath = args[args.indexOf('--attempts-file') + 1];", | ||
| 'let attempts = {};', | ||
| "if (fs.existsSync(attemptsPath)) attempts = JSON.parse(fs.readFileSync(attemptsPath, 'utf8'));", | ||
| "attempts[chunkId] = (attempts[chunkId] || 0) + 1;", | ||
| "fs.mkdirSync(path.dirname(attemptsPath), { recursive: true });", | ||
| "fs.writeFileSync(attemptsPath, JSON.stringify(attempts, null, 2));", | ||
| 'if (attempts[chunkId] === 1) {', | ||
| " fs.writeFileSync(findingsPath, JSON.stringify([{ bugId: 'BUG-stale', severity: 'Low', category: 'logic', file: 'src/a.ts', lines: '1', claim: 'stale artifact', evidence: 'src/a.ts:1 evidence', runtimeTrigger: 'Call a()', crossReferences: ['Single file'], confidenceScore: 60 }], null, 2));", | ||
| ' process.exit(1);', | ||
| '}', | ||
| 'process.exit(0);' | ||
| ].join('\n'), 'utf8'); | ||
| const workerTemplate = [ | ||
| 'node', | ||
| workerPath, | ||
| '--chunk-id', | ||
| '{chunkId}', | ||
| '--findings-json', | ||
| '{findingsJson}', | ||
| '--attempts-file', | ||
| attemptsFile | ||
| ].join(' '); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| workerTemplate, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--max-retries', | ||
| '1', | ||
| '--backoff-ms', | ||
| '10', | ||
| '--journal-path', | ||
| journalPath | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| const attempts = readJson(attemptsFile); | ||
| assert.equal(attempts['chunk-1'], 2); | ||
| const state = readJson(statePath); | ||
| assert.equal(state.chunks[0].status, 'failed'); | ||
| }); | ||
| test('run-bug-hunter handles worker paths containing spaces', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-space-path-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const workerPath = path.join(sandbox, 'worker script.cjs'); | ||
| const sourceFile = path.join(sandbox, 'src', 'dir with space', 'a.ts'); | ||
| fs.mkdirSync(path.dirname(sourceFile), { recursive: true }); | ||
| fs.writeFileSync(sourceFile, 'export const a = 1;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [sourceFile]); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const args = process.argv;", | ||
| "const findingsPath = args[args.indexOf('--findings-json') + 1];", | ||
| "const scanFilesJson = args[args.indexOf('--scan-files-json') + 1];", | ||
| "const scanFiles = JSON.parse(fs.readFileSync(scanFilesJson, 'utf8'));", | ||
| "fs.writeFileSync(findingsPath, JSON.stringify([{ bugId: 'BUG-space', severity: 'Low', category: 'logic', file: scanFiles[0], lines: '1', claim: 'space path works', evidence: scanFiles[0] + ':1 evidence', runtimeTrigger: 'Call a()', crossReferences: ['Single file'], confidenceScore: 60 }], null, 2));" | ||
| ].join('\n'), 'utf8'); | ||
| const workerTemplate = [ | ||
| 'node', | ||
| workerPath, | ||
| '--chunk-id', | ||
| '{chunkId}', | ||
| '--scan-files-json', | ||
| '{scanFilesJson}', | ||
| '--findings-json', | ||
| '{findingsJson}' | ||
| ].join(' '); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| workerTemplate, | ||
| '--timeout-ms', | ||
| '5000' | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| }); | ||
| test('run-bug-hunter skips fix strategy and fix plan emission when chunks fail', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-failed-chunks-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const filesJsonPath = path.join(sandbox, 'files.json'); | ||
| const statePath = path.join(sandbox, '.claude', 'bug-hunter-state.json'); | ||
| const fixPlanPath = path.join(sandbox, '.claude', 'bug-hunter-fix-plan.json'); | ||
| const strategyPath = path.join(sandbox, '.claude', 'bug-hunter-fix-strategy.json'); | ||
| const workerPath = path.join(sandbox, 'always-fail-worker.cjs'); | ||
| const sourceFile = path.join(sandbox, 'src', 'a.ts'); | ||
| fs.mkdirSync(path.dirname(sourceFile), { recursive: true }); | ||
| fs.writeFileSync(sourceFile, 'export const a = 1;\n', 'utf8'); | ||
| writeJson(filesJsonPath, [sourceFile]); | ||
| fs.writeFileSync(workerPath, '#!/usr/bin/env node\nprocess.exit(1);\n', 'utf8'); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'run', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--files-json', | ||
| filesJsonPath, | ||
| '--state', | ||
| statePath, | ||
| '--chunk-size', | ||
| '1', | ||
| '--worker-cmd', | ||
| `node ${workerPath}`, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--max-retries', | ||
| '1', | ||
| '--fix-plan-path', | ||
| fixPlanPath, | ||
| '--strategy-path', | ||
| strategyPath | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| const state = readJson(statePath); | ||
| assert.equal(state.chunks[0].status, 'failed'); | ||
| assert.equal(fs.existsSync(fixPlanPath), false); | ||
| assert.equal(fs.existsSync(strategyPath), false); | ||
| }); | ||
| test('run-bug-hunter fails fast on unknown placeholders in worker templates', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-bad-template-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const outputPath = path.join(sandbox, '.bug-hunter', 'skeptic.json'); | ||
| const result = runRaw('node', [ | ||
| runner, | ||
| 'phase', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--phase-name', | ||
| 'skeptic-phase', | ||
| '--artifact', | ||
| 'skeptic', | ||
| '--output-path', | ||
| outputPath, | ||
| '--worker-cmd', | ||
| 'node fake-worker --output-path {outputPath} --missing {unknownPlaceholder}', | ||
| '--timeout-ms', | ||
| '5000' | ||
| ], { | ||
| cwd: sandbox, | ||
| encoding: 'utf8' | ||
| }); | ||
| assert.notEqual(result.status, 0); | ||
| assert.match(`${result.stdout || ''}${result.stderr || ''}`, /Unknown template placeholder|unknownPlaceholder/); | ||
| }); | ||
| test('run-bug-hunter phase retries invalid skeptic output and renders a markdown companion', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-phase-skeptic-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const outputPath = path.join(sandbox, '.bug-hunter', 'skeptic.json'); | ||
| const renderOutputPath = path.join(sandbox, '.bug-hunter', 'skeptic.md'); | ||
| const journalPath = path.join(sandbox, '.bug-hunter', 'phase.log'); | ||
| const attemptsFile = path.join(sandbox, 'attempts.json'); | ||
| const workerPath = path.join(sandbox, 'skeptic-worker.cjs'); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const path = require('path');", | ||
| "const args = process.argv;", | ||
| "const outputPath = args[args.indexOf('--output-path') + 1];", | ||
| "const attemptsPath = args[args.indexOf('--attempts-file') + 1];", | ||
| 'let attempts = {};', | ||
| "if (fs.existsSync(attemptsPath)) attempts = JSON.parse(fs.readFileSync(attemptsPath, 'utf8'));", | ||
| "attempts.skeptic = (attempts.skeptic || 0) + 1;", | ||
| "fs.mkdirSync(path.dirname(attemptsPath), { recursive: true });", | ||
| "fs.writeFileSync(attemptsPath, JSON.stringify(attempts, null, 2));", | ||
| "const payload = attempts.skeptic === 1", | ||
| " ? [{ bugId: 'BUG-1', response: 'ACCEPT' }]", | ||
| " : [{ bugId: 'BUG-1', response: 'ACCEPT', analysisSummary: 'Validated on retry.' }];", | ||
| "fs.mkdirSync(path.dirname(outputPath), { recursive: true });", | ||
| "fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2));" | ||
| ].join('\n'), 'utf8'); | ||
| const workerTemplate = [ | ||
| 'node', | ||
| workerPath, | ||
| '--output-path', | ||
| '{outputPath}', | ||
| '--attempts-file', | ||
| attemptsFile | ||
| ].join(' '); | ||
| const renderTemplate = [ | ||
| 'node', | ||
| path.join(skillDir, 'scripts', 'render-report.cjs'), | ||
| 'skeptic', | ||
| '{outputPath}', | ||
| '>', | ||
| '{renderOutputPath}' | ||
| ].join(' '); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'phase', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--phase-name', | ||
| 'skeptic-phase', | ||
| '--artifact', | ||
| 'skeptic', | ||
| '--output-path', | ||
| outputPath, | ||
| '--render-output-path', | ||
| renderOutputPath, | ||
| '--worker-cmd', | ||
| workerTemplate, | ||
| '--render-cmd', | ||
| renderTemplate, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--max-retries', | ||
| '1', | ||
| '--backoff-ms', | ||
| '10', | ||
| '--journal-path', | ||
| journalPath | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.artifact, 'skeptic'); | ||
| assert.equal(fs.existsSync(outputPath), true); | ||
| assert.equal(fs.existsSync(renderOutputPath), true); | ||
| const attempts = readJson(attemptsFile); | ||
| assert.equal(attempts.skeptic, 2); | ||
| const journal = fs.readFileSync(journalPath, 'utf8'); | ||
| assert.match(journal, /attempt-post-check-failed/); | ||
| assert.match(journal, /\$\[0\]\.analysisSummary is required/); | ||
| const rendered = fs.readFileSync(renderOutputPath, 'utf8'); | ||
| assert.match(rendered, /# Skeptic Review/); | ||
| assert.match(rendered, /Validated on retry/); | ||
| }); | ||
| test('run-bug-hunter phase validates referee and fix-report artifacts', () => { | ||
| const sandbox = makeSandbox('run-bug-hunter-phase-multi-'); | ||
| const runner = resolveSkillScript('run-bug-hunter.cjs'); | ||
| const skillDir = path.resolve(__dirname, '..', '..'); | ||
| const phases = [ | ||
| { | ||
| artifact: 'referee', | ||
| invalidBody: "[{\"bugId\":\"BUG-1\",\"verdict\":\"REAL_BUG\"}]", | ||
| validBody: JSON.stringify([ | ||
| { | ||
| bugId: 'BUG-1', | ||
| verdict: 'REAL_BUG', | ||
| trueSeverity: 'Critical', | ||
| confidenceScore: 99, | ||
| confidenceLabel: 'high', | ||
| verificationMode: 'INDEPENDENTLY_VERIFIED', | ||
| analysisSummary: 'Confirmed on retry.' | ||
| } | ||
| ], null, 2), | ||
| expectedError: '\\$\\[0\\]\\.trueSeverity is required' | ||
| }, | ||
| { | ||
| artifact: 'fix-report', | ||
| invalidBody: JSON.stringify({ | ||
| version: '3.0.4', | ||
| fix_branch: 'bug-hunter-fix-branch' | ||
| }, null, 2), | ||
| validBody: JSON.stringify({ | ||
| version: '3.0.4', | ||
| fix_branch: 'bug-hunter-fix-branch', | ||
| base_commit: 'abc123', | ||
| dry_run: false, | ||
| circuit_breaker_tripped: false, | ||
| phase2_timeout_hit: false, | ||
| fixes: [], | ||
| verification: { | ||
| baseline_pass: 1, | ||
| baseline_fail: 0, | ||
| flaky_tests: 0, | ||
| final_pass: 1, | ||
| final_fail: 0, | ||
| new_failures: 0, | ||
| resolved_failures: 0, | ||
| typecheck_pass: true, | ||
| build_pass: true, | ||
| fixer_bugs_found: 0 | ||
| }, | ||
| summary: { | ||
| total_confirmed: 0, | ||
| eligible: 0, | ||
| manual_review: 0, | ||
| fixed: 0, | ||
| fix_reverted: 0, | ||
| fix_failed: 0, | ||
| skipped: 0, | ||
| fixer_bug: 0, | ||
| partial: 0 | ||
| } | ||
| }, null, 2), | ||
| expectedError: '\\$\\.base_commit is required' | ||
| } | ||
| ]; | ||
| phases.forEach((phase) => { | ||
| const outputPath = path.join(sandbox, '.bug-hunter', `${phase.artifact}.json`); | ||
| const journalPath = path.join(sandbox, '.bug-hunter', `${phase.artifact}.log`); | ||
| const attemptsFile = path.join(sandbox, `${phase.artifact}-attempts.json`); | ||
| const workerPath = path.join(sandbox, `${phase.artifact}-worker.cjs`); | ||
| fs.writeFileSync(workerPath, [ | ||
| '#!/usr/bin/env node', | ||
| "const fs = require('fs');", | ||
| "const path = require('path');", | ||
| "const args = process.argv;", | ||
| "const outputPath = args[args.indexOf('--output-path') + 1];", | ||
| "const attemptsPath = args[args.indexOf('--attempts-file') + 1];", | ||
| 'let attempts = 0;', | ||
| "if (fs.existsSync(attemptsPath)) attempts = Number(fs.readFileSync(attemptsPath, 'utf8'));", | ||
| 'attempts += 1;', | ||
| "fs.mkdirSync(path.dirname(attemptsPath), { recursive: true });", | ||
| "fs.writeFileSync(attemptsPath, String(attempts));", | ||
| `const invalidBody = ${JSON.stringify(phase.invalidBody)};`, | ||
| `const validBody = ${JSON.stringify(phase.validBody)};`, | ||
| "fs.mkdirSync(path.dirname(outputPath), { recursive: true });", | ||
| "fs.writeFileSync(outputPath, attempts === 1 ? invalidBody : validBody);" | ||
| ].join('\n'), 'utf8'); | ||
| const workerTemplate = [ | ||
| 'node', | ||
| workerPath, | ||
| '--output-path', | ||
| '{outputPath}', | ||
| '--attempts-file', | ||
| attemptsFile | ||
| ].join(' '); | ||
| const result = runJson('node', [ | ||
| runner, | ||
| 'phase', | ||
| '--skill-dir', | ||
| skillDir, | ||
| '--phase-name', | ||
| `${phase.artifact}-phase`, | ||
| '--artifact', | ||
| phase.artifact, | ||
| '--output-path', | ||
| outputPath, | ||
| '--worker-cmd', | ||
| workerTemplate, | ||
| '--timeout-ms', | ||
| '5000', | ||
| '--max-retries', | ||
| '1', | ||
| '--backoff-ms', | ||
| '10', | ||
| '--journal-path', | ||
| journalPath | ||
| ], { | ||
| cwd: sandbox | ||
| }); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.artifact, phase.artifact); | ||
| assert.equal(fs.existsSync(outputPath), true); | ||
| const journal = fs.readFileSync(journalPath, 'utf8'); | ||
| assert.match(journal, /attempt-post-check-failed/); | ||
| assert.match(journal, new RegExp(phase.expectedError)); | ||
| }); | ||
| }); |
@@ -109,2 +109,13 @@ const assert = require('node:assert/strict'); | ||
| test('prepare refuses to delete an unrelated pre-existing directory', () => { | ||
| const { repo, fixBranch } = makeGitFixture(); | ||
| const wtDir = path.join(repo, '.bug-hunter', 'worktrees', 'notes'); | ||
| fs.mkdirSync(wtDir, { recursive: true }); | ||
| fs.writeFileSync(path.join(wtDir, 'keep.txt'), 'keep\n'); | ||
| const result = runRaw('node', [SCRIPT, 'prepare', fixBranch, wtDir], { cwd: repo }); | ||
| assert.notEqual(result.status, 0); | ||
| assert.equal(fs.existsSync(path.join(wtDir, 'keep.txt')), true); | ||
| }); | ||
| test('harvest finds new commits', () => { | ||
@@ -184,2 +195,41 @@ const { repo, fixBranch } = makeGitFixture(); | ||
| test('cleanup does not report success for unmanaged directories', () => { | ||
| const { repo } = makeGitFixture(); | ||
| const wtDir = path.join(repo, '.bug-hunter', 'worktrees', 'notes'); | ||
| fs.mkdirSync(wtDir, { recursive: true }); | ||
| fs.writeFileSync(path.join(wtDir, 'keep.txt'), 'keep\n'); | ||
| const result = runJson('node', [SCRIPT, 'cleanup', wtDir], { cwd: repo }); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.removed, false); | ||
| assert.equal(fs.existsSync(path.join(wtDir, 'keep.txt')), true); | ||
| }); | ||
| test('cleanup preserves worktree contents when harvest fails', () => { | ||
| const { repo, fixBranch } = makeGitFixture(); | ||
| const wtDir = path.join(repo, '.bug-hunter', 'worktrees', 'batch-1'); | ||
| runJson('node', [SCRIPT, 'prepare', fixBranch, wtDir], { cwd: repo }); | ||
| fs.rmSync(path.join(wtDir, '.worktree-manifest.json')); | ||
| const result = runJson('node', [SCRIPT, 'cleanup', wtDir], { cwd: repo }); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.removed, false); | ||
| assert.equal(fs.existsSync(wtDir), true); | ||
| }); | ||
| test('cleanup returns stash metadata when defensive harvest stashes uncommitted work', () => { | ||
| const { repo, fixBranch } = makeGitFixture(); | ||
| const wtDir = path.join(repo, '.bug-hunter', 'worktrees', 'batch-1'); | ||
| runJson('node', [SCRIPT, 'prepare', fixBranch, wtDir], { cwd: repo }); | ||
| fs.writeFileSync(path.join(wtDir, 'dirty.txt'), 'uncommitted\n'); | ||
| const result = runJson('node', [SCRIPT, 'cleanup', wtDir], { cwd: repo }); | ||
| assert.equal(result.ok, true); | ||
| assert.equal(result.removed, true); | ||
| assert.equal(typeof result.stashRef, 'string'); | ||
| assert.equal(result.stashRef.length > 0, true); | ||
| }); | ||
| test('cleanup-all removes multiple worktrees', () => { | ||
@@ -207,2 +257,17 @@ const { repo, fixBranch } = makeGitFixture(); | ||
| test('cleanup-all preserves unrelated directories under the parent', () => { | ||
| const { repo, fixBranch } = makeGitFixture(); | ||
| const parentDir = path.join(repo, '.bug-hunter', 'worktrees'); | ||
| const wt1 = path.join(parentDir, 'batch-1'); | ||
| const unrelated = path.join(parentDir, 'notes'); | ||
| runJson('node', [SCRIPT, 'prepare', fixBranch, wt1], { cwd: repo }); | ||
| fs.mkdirSync(unrelated, { recursive: true }); | ||
| fs.writeFileSync(path.join(unrelated, 'readme.txt'), 'keep me\n', 'utf8'); | ||
| runJson('node', [SCRIPT, 'cleanup-all', parentDir], { cwd: repo }); | ||
| assert.equal(fs.existsSync(unrelated), true); | ||
| assert.equal(fs.existsSync(path.join(unrelated, 'readme.txt')), true); | ||
| }); | ||
| test('checkout-fix returns main tree to fix branch', () => { | ||
@@ -296,2 +361,3 @@ const { repo, fixBranch } = makeGitFixture(); | ||
| assert.equal(s2.harvested, false); | ||
| assert.equal(s2.hasUncommitted, false); | ||
@@ -298,0 +364,0 @@ // Clean up |
@@ -85,2 +85,17 @@ #!/usr/bin/env node | ||
| function isManagedWorktreeDir(worktreeDir) { | ||
| const absDir = path.resolve(worktreeDir); | ||
| if (readJsonFile(manifestPath(absDir))) { | ||
| return true; | ||
| } | ||
| const listed = gitSafe(['worktree', 'list', '--porcelain']); | ||
| if (!listed.ok || !listed.output) { | ||
| return false; | ||
| } | ||
| return listed.output | ||
| .split('\n') | ||
| .filter((line) => line.startsWith('worktree ')) | ||
| .some((line) => path.resolve(line.slice('worktree '.length).trim()) === absDir); | ||
| } | ||
| // --------------------------------------------------------------------------- | ||
@@ -100,4 +115,9 @@ // prepare — create worktree on the fix branch | ||
| // 2. If worktreeDir already exists, clean up stale worktree | ||
| // 2. If worktreeDir already exists, clean up stale managed worktree only | ||
| if (fs.existsSync(absDir)) { | ||
| const managed = isManagedWorktreeDir(absDir); | ||
| if (!managed) { | ||
| out({ ok: false, error: 'path-not-managed-worktree', detail: `${absDir} already exists and is not a managed worktree` }); | ||
| process.exit(1); | ||
| } | ||
| gitSafe(['worktree', 'remove', absDir, '--force']); | ||
@@ -324,5 +344,17 @@ if (fs.existsSync(absDir)) { | ||
| const managed = isManagedWorktreeDir(absDir); | ||
| if (!managed) { | ||
| out({ ok: true, removed: false, reason: 'not-managed-worktree' }); | ||
| return; | ||
| } | ||
| let defensiveHarvest = readJsonFile(harvestPath(absDir)); | ||
| // If harvest hasn't run yet, run it defensively | ||
| if (!readJsonFile(harvestPath(absDir))) { | ||
| try { harvestCore(absDir); } catch (_) { /* best-effort */ } | ||
| if (!defensiveHarvest) { | ||
| try { | ||
| defensiveHarvest = harvestCore(absDir); | ||
| } catch (_) { | ||
| out({ ok: true, removed: false, reason: 'harvest-failed' }); | ||
| return; | ||
| } | ||
| } | ||
@@ -339,7 +371,10 @@ | ||
| gitSafe(['worktree', 'prune']); | ||
| const removed = !fs.existsSync(absDir); | ||
| out({ | ||
| ok: true, | ||
| removed: true, | ||
| detachedMainTree: manifest ? manifest.detachedMainTree : false | ||
| removed, | ||
| detachedMainTree: manifest ? manifest.detachedMainTree : false, | ||
| reason: removed ? undefined : 'remove-failed', | ||
| stashRef: defensiveHarvest && defensiveHarvest.stashRef ? defensiveHarvest.stashRef : null | ||
| }); | ||
@@ -374,5 +409,16 @@ } | ||
| try { | ||
| const managed = isManagedWorktreeDir(wtDir); | ||
| if (!managed) { | ||
| results.push({ name, removed: false, reason: 'not-managed-worktree' }); | ||
| continue; | ||
| } | ||
| let defensiveHarvest = readJsonFile(harvestPath(wtDir)); | ||
| // Defensive harvest before cleanup | ||
| if (!readJsonFile(harvestPath(wtDir))) { | ||
| try { harvestCore(wtDir); } catch (_) { /* best-effort */ } | ||
| if (!defensiveHarvest) { | ||
| try { | ||
| defensiveHarvest = harvestCore(wtDir); | ||
| } catch (_) { | ||
| results.push({ name, removed: false, reason: 'harvest-failed' }); | ||
| continue; | ||
| } | ||
| } | ||
@@ -383,3 +429,3 @@ gitSafe(['worktree', 'remove', wtDir, '--force']); | ||
| } | ||
| results.push({ name, removed: true }); | ||
| results.push({ name, removed: true, stashRef: defensiveHarvest && defensiveHarvest.stashRef ? defensiveHarvest.stashRef : null }); | ||
| } catch (err) { | ||
@@ -419,3 +465,10 @@ results.push({ name, removed: false, error: err.message }); | ||
| const statusOutput = gitSafe(['status', '--porcelain'], absDir); | ||
| const hasUncommitted = statusOutput.ok && statusOutput.output.length > 0; | ||
| const statusLines = statusOutput.ok | ||
| ? statusOutput.output.split('\n').filter(Boolean) | ||
| : []; | ||
| const relevantLines = statusLines.filter(line => { | ||
| const fileName = line.slice(3); | ||
| return !META_FILES.some(mf => fileName === mf || fileName.endsWith(`/${mf}`)); | ||
| }); | ||
| const hasUncommitted = relevantLines.length > 0; | ||
@@ -422,0 +475,0 @@ let commitCount = 0; |
+94
-27
| --- | ||
| name: bug-hunter | ||
| description: "Adversarial bug hunting with a sequential-first pipeline (Recon, Hunter, Skeptic, Referee) that can optionally use safe read-only parallel triage. Finds, verifies, and auto-fixes real bugs by default (with --scan-only opt-out) using checkpointed verification and resume state for large codebases. Use this skill whenever the user wants bug finding, security audits, regression checks, or code review focused on runtime behavior." | ||
| argument-hint: "[path | -b <branch> [--base <base-branch>] | --staged | --scan-only | --fix | --autonomous | --no-loop | --approve | --deps | --threat-model | --dry-run]" | ||
| disable-model-invocation: true | ||
| --- | ||
@@ -47,9 +45,21 @@ | ||
| /bug-hunter -b feature-xyz --base dev # Scan files changed in feature-xyz vs dev | ||
| /bug-hunter --pr # Easy alias for --pr current | ||
| /bug-hunter --pr current # Review the current PR end to end | ||
| /bug-hunter --pr recent --scan-only # Review the most recent PR without editing code | ||
| /bug-hunter --pr 123 # Review a specific PR number | ||
| /bug-hunter --pr-security # PR security review: PR scope + threat model + dependency scan | ||
| /bug-hunter --last-pr --review # Easy mnemonic for “review the last PR” | ||
| /bug-hunter --review-pr # Alias for --pr current | ||
| /bug-hunter --staged # Scan staged files (pre-commit check) | ||
| /bug-hunter --scan-only src/ # Scan only, no code changes | ||
| /bug-hunter --review src/ # Easy alias for --scan-only | ||
| /bug-hunter --fix src/ # Find bugs AND auto-fix them | ||
| /bug-hunter --plan-only src/ # Build fix strategy + plan, but do not edit files | ||
| /bug-hunter --plan src/ # Easy alias for --plan-only | ||
| /bug-hunter --safe src/ # Easy alias for --fix --approve | ||
| /bug-hunter --preview src/ # Easy alias for --fix --dry-run | ||
| /bug-hunter --autonomous src/ # Alias for no-intervention auto-fix run | ||
| /bug-hunter --fix -b feature-xyz # Find + fix on branch diff | ||
| /bug-hunter --fix --approve src/ # Find + fix, but ask before each fix | ||
| /bug-hunter src/ # Loops by default: audit until 100% coverage | ||
| /bug-hunter src/ # Loops by default: audit + fix until all queued source files are covered | ||
| /bug-hunter --no-loop src/ # Single-pass only, no iterating | ||
@@ -59,2 +69,4 @@ /bug-hunter --no-loop --scan-only src/ # Single-pass scan, no fixes, no loop | ||
| /bug-hunter --threat-model src/ # Generate/use STRIDE threat model | ||
| /bug-hunter --security-review src/ # Enterprise security workflow: threat model + CVEs + validation | ||
| /bug-hunter --validate-security src/ # Force vulnerability-validation for security findings | ||
| /bug-hunter --deps --threat-model src/ # Full security audit | ||
@@ -80,4 +92,26 @@ /bug-hunter --fix --dry-run src/ # Preview fixes without editing files | ||
| 0i. If arguments contain `--dry-run`: strip it and set `DRY_RUN_MODE=true`. Forces `FIX_MODE=true`. In dry-run mode, Phase 2 builds the fix plan and the Fixer reads code and outputs planned changes as unified diff previews, but no file edits, git commits, or lock acquisition occur. Produces `fix-report.json` with `"dry_run": true`. | ||
| 0j. If arguments contain `--preview`: strip it, set `DRY_RUN_MODE=true`, and force `FIX_MODE=true`. Treat it as a memorable alias for `--fix --dry-run`. | ||
| 0k. If arguments contain `--plan-only`: strip it and set `PLAN_ONLY_MODE=true`. The pipeline still scans, verifies, and builds `fix-strategy.json` + `fix-plan.json`, but it stops before the Fixer edits code. | ||
| 0l. If arguments contain `--plan`: strip it and set `PLAN_ONLY_MODE=true`. Treat it as a memorable alias for `--plan-only`. | ||
| 0m. If arguments contain `--review-pr`: strip it and treat it as `--pr current`. | ||
| 0n. If arguments contain `--pr` with no selector after it, treat it as `--pr current`. | ||
| 0o. If arguments contain `--last-pr`: strip it and treat it as `--pr recent`. | ||
| 0p. If arguments contain `--review`: strip it and set `FIX_MODE=false`. Treat it as a memorable alias for `--scan-only`. | ||
| 0q. If arguments contain `--safe`: strip it, set `FIX_MODE=true`, and set `APPROVE_MODE=true`. Treat it as a memorable alias for `--fix --approve`. | ||
| 0r. If arguments contain `--pr-security`: strip it, set `PR_SECURITY_MODE=true`, force `DEP_SCAN=true`, force `THREAT_MODEL_MODE=true`, force `FIX_MODE=false`, and if no explicit `--pr` selector was provided treat it as `--pr current`. | ||
| 0s. If arguments contain `--security-review`: strip it, set `SECURITY_REVIEW_MODE=true`, force `DEP_SCAN=true`, force `THREAT_MODEL_MODE=true`, and force `FIX_MODE=false`. | ||
| 0t. If arguments contain `--validate-security`: strip it and set `VALIDATE_SECURITY_MODE=true`. | ||
| 1. If arguments contain `--staged`: this is **staged file mode**. | ||
| 1. If arguments contain `--pr <selector>`: this is **PR review mode**. | ||
| - Valid selectors: `current`, `recent`, or a PR number like `123`. | ||
| - If `--base <base-branch>` is present, pass it through for current-branch git fallback. | ||
| - Run: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/pr-scope.cjs" resolve "<selector>" --repo-root "$PWD" [--base <base-branch>] | ||
| ``` | ||
| - If it fails, report the error to the user and stop. | ||
| - Save the JSON result to `.bug-hunter/pr-scope.json` for later reporting. | ||
| - Use `changedFiles` from the JSON output as the scan target (scan full file contents, not just the diff). | ||
| 2. If arguments contain `--staged`: this is **staged file mode**. | ||
| - Run `git diff --cached --name-only` using the Bash tool to get the list of staged files. | ||
@@ -88,3 +122,3 @@ - If the command fails, report the error to the user and stop. | ||
| 2. If arguments contain `-b <branch>`: this is **branch diff mode**. | ||
| 3. If arguments contain `-b <branch>`: this is **branch diff mode**. | ||
| - Extract the branch name after `-b`. | ||
@@ -97,5 +131,5 @@ - If `--base <base-branch>` is also present, use that as the base branch. Otherwise default to `main`. | ||
| 3. If arguments do NOT contain `-b` or `--staged`: treat the entire argument string as a **path target** (file or directory). If empty, scan the current working directory. | ||
| 4. If arguments do NOT contain `--pr`, `-b`, or `--staged`: treat the entire argument string as a **path target** (file or directory). If empty, scan the current working directory. | ||
| **After resolving the file list (for modes 1 and 2), filter out non-source files:** | ||
| **After resolving the file list (for modes 1, 2, and 3), filter out non-source files:** | ||
@@ -138,3 +172,3 @@ Remove any files matching these patterns — they are not scannable source code: | ||
| - **Service-aware partitioning (preferred)**: If Recon detected multiple service boundaries (monorepo), partition by service. | ||
| - **Risk-tier partitioning (fallback)**: process CRITICAL then HIGH then MEDIUM. | ||
| - **Risk-tier partitioning (fallback)**: process CRITICAL then HIGH then MEDIUM then LOW. | ||
| - Keep chunk size small (recommended 20-40 files) to avoid context compaction issues. | ||
@@ -178,3 +212,3 @@ - Persist chunk progress in `.bug-hunter/state.json` so restarts do not re-scan done chunks. | ||
| ``` | ||
| ls "$SKILL_DIR/scripts/run-bug-hunter.cjs" "$SKILL_DIR/scripts/bug-hunter-state.cjs" "$SKILL_DIR/scripts/delta-mode.cjs" "$SKILL_DIR/scripts/payload-guard.cjs" "$SKILL_DIR/scripts/fix-lock.cjs" "$SKILL_DIR/scripts/triage.cjs" "$SKILL_DIR/scripts/doc-lookup.cjs" | ||
| ls "$SKILL_DIR/scripts/run-bug-hunter.cjs" "$SKILL_DIR/scripts/bug-hunter-state.cjs" "$SKILL_DIR/scripts/delta-mode.cjs" "$SKILL_DIR/scripts/payload-guard.cjs" "$SKILL_DIR/scripts/fix-lock.cjs" "$SKILL_DIR/scripts/triage.cjs" "$SKILL_DIR/scripts/doc-lookup.cjs" "$SKILL_DIR/scripts/pr-scope.cjs" | ||
| ``` | ||
@@ -259,6 +293,6 @@ If any are missing, stop and tell the user to update/reinstall the skill. | ||
| Follow the rules in the **Target** section above. If in branch diff or staged mode, run the appropriate git command now, collect the file list, and apply the filter. | ||
| Follow the rules in the **Target** section above. If in PR review, branch diff, or staged mode, run the appropriate resolver command now, collect the file list, and apply the filter. | ||
| Report to the user: | ||
| - Mode (full project / directory / file / branch diff / staged) | ||
| - Mode (full project / directory / file / PR review / branch diff / staged) | ||
| - Number of source files to scan (after filtering) | ||
@@ -307,3 +341,3 @@ - Number of files filtered out | ||
| Single-pass mode will only cover a subset. Remove `--no-loop` to enable iterative coverage. | ||
| Proceeding with partial scan — CRITICAL and HIGH domains only. | ||
| Proceeding with partial scan — highest-priority queued files only. | ||
| ``` | ||
@@ -316,3 +350,6 @@ | ||
| If `THREAT_MODEL_MODE=true`: | ||
| 1. Check if `.bug-hunter/threat-model.md` already exists. | ||
| 1. Read the bundled local skill `SKILL_DIR/skills/threat-model-generation/SKILL.md` before generating the threat model. This keeps the enterprise security pack end-to-end connected to the main Bug Hunter flow. | ||
| 2. Use the bundled skill's Bug Hunter-native artifact conventions (`.bug-hunter/threat-model.md`, `.bug-hunter/security-config.json`). | ||
| 3. Check if `.bug-hunter/threat-model.md` already exists. | ||
| - If it exists and was modified within the last 90 days: use it as-is. Set `THREAT_MODEL_AVAILABLE=true`. | ||
@@ -334,3 +371,6 @@ - If it exists but is >90 days old: warn user ("Threat model is N days old — regenerating"), regenerate. | ||
| If `DEP_SCAN=true`: | ||
| If `DEP_SCAN=true` or `SECURITY_REVIEW_MODE=true` or `PR_SECURITY_MODE=true`: | ||
| - Read the bundled local skill `SKILL_DIR/skills/security-review/SKILL.md` when running the broader enterprise security workflow. | ||
| If `DEP_SCAN=true`: | ||
| ```bash | ||
@@ -349,2 +389,7 @@ node "$SKILL_DIR/scripts/dep-scan.cjs" --target "<TARGET_PATH>" --output .bug-hunter/dep-findings.json | ||
| **Security-pack routing:** | ||
| - If `PR_SECURITY_MODE=true`, read `SKILL_DIR/skills/commit-security-scan/SKILL.md` before the normal PR-review scan. | ||
| - If `SECURITY_REVIEW_MODE=true`, read `SKILL_DIR/skills/security-review/SKILL.md` before the broader security audit flow. | ||
| - If `VALIDATE_SECURITY_MODE=true`, read `SKILL_DIR/skills/vulnerability-validation/SKILL.md` before finalizing confirmed security findings. | ||
| **MANDATORY**: You MUST read prompt files using the Read tool before passing them to subagents or executing them yourself. Do NOT skip this or act from memory. Use the absolute SKILL_DIR path resolved in Step 0. | ||
@@ -356,5 +401,8 @@ | ||
| |-------|-----------------| | ||
| | Threat Model (Step 1b) | `prompts/threat-model.md` (only if THREAT_MODEL_MODE=true) | | ||
| | PR security review | `skills/commit-security-scan/SKILL.md` (if `PR_SECURITY_MODE=true` or the user asks for PR-focused security review) | | ||
| | Security review | `skills/security-review/SKILL.md` (if `SECURITY_REVIEW_MODE=true` or the user asks for an enterprise/full security audit) | | ||
| | Threat Model (Step 1b) | `skills/threat-model-generation/SKILL.md` + `prompts/threat-model.md` (only if THREAT_MODEL_MODE=true) | | ||
| | Recon (Step 4) | `prompts/recon.md` (skip for single-file mode) | | ||
| | Hunters (Step 5) | `prompts/hunter.md` + `prompts/doc-lookup.md` + `prompts/examples/hunter-examples.md` | | ||
| | Security validation | `skills/vulnerability-validation/SKILL.md` (if `VALIDATE_SECURITY_MODE=true` or confirmed security findings need exploitability validation) | | ||
| | Skeptics (Step 6) | `prompts/skeptic.md` + `prompts/doc-lookup.md` + `prompts/examples/skeptic-examples.md` | | ||
@@ -378,4 +426,4 @@ | Referee (Step 7) | `prompts/referee.md` | | ||
| # 3. Write your findings to disk: | ||
| write({ path: ".bug-hunter/findings.md", content: "<your findings>" }) | ||
| # 3. Write your canonical findings artifact to disk: | ||
| write({ path: ".bug-hunter/findings.json", content: "<your findings json>" }) | ||
| ``` | ||
@@ -400,3 +448,3 @@ | ||
| # - {PHASE_SPECIFIC_CONTEXT} = <doc-lookup instructions from doc-lookup.md> | ||
| # - {OUTPUT_FILE_PATH} = ".bug-hunter/findings.md" | ||
| # - {OUTPUT_FILE_PATH} = ".bug-hunter/findings.json" | ||
| # - {SKILL_DIR} = <absolute path> | ||
@@ -407,6 +455,6 @@ # 4. Dispatch: | ||
| task: "<the filled template>", | ||
| output: ".bug-hunter/findings.md" | ||
| output: ".bug-hunter/findings.json" | ||
| }) | ||
| # 5. Read the output: | ||
| read({ path: ".bug-hunter/findings.md" }) | ||
| read({ path: ".bug-hunter/findings.json" }) | ||
| ``` | ||
@@ -510,3 +558,3 @@ | ||
| ### 7. Coverage assessment | ||
| - If ALL CRITICAL/HIGH files scanned: "Full coverage achieved." | ||
| - If ALL queued scannable source files scanned: "Full queued coverage achieved." | ||
| - If any missed: list them with note about `--loop` mode. | ||
@@ -516,5 +564,5 @@ | ||
| If the coverage assessment shows ANY CRITICAL or HIGH files were not scanned, the pipeline is NOT complete: | ||
| If the coverage assessment shows ANY queued scannable source files were not scanned, the pipeline is NOT complete: | ||
| 1. If `LOOP_MODE=true` (default): the ralph-loop will automatically continue to the next iteration covering missed files. Call `ralph_done` to proceed to the next iteration. Do NOT output `<promise>COMPLETE</promise>` until all CRITICAL/HIGH files show DONE. | ||
| 1. If `LOOP_MODE=true` (default): the ralph-loop will automatically continue to the next iteration covering missed files. Call `ralph_done` to proceed to the next iteration. Do NOT output `<promise>COMPLETE</promise>` until all queued scannable source files show DONE. | ||
@@ -524,3 +572,3 @@ 2. If `LOOP_MODE=false` (`--no-loop` was specified) AND missed files exist: | ||
| ``` | ||
| ⚠️ PARTIAL COVERAGE: [N] CRITICAL/HIGH files were not scanned. | ||
| ⚠️ PARTIAL COVERAGE: [N] queued source files were not scanned. | ||
| Run `/bug-hunter [path]` for complete coverage (loop is on by default). | ||
@@ -532,12 +580,26 @@ Unscanned files: [list them] | ||
| 🚨 LARGE CODEBASE: [N] source files (FILE_BUDGET: [B]). | ||
| Single-pass audit covered [X]% of CRITICAL/HIGH files. | ||
| Single-pass audit covered [X]% of queued source files. | ||
| Use `/bug-hunter [path]` for full coverage (loop is on by default). | ||
| ``` | ||
| 3. Do NOT claim "audit complete" or "full coverage achieved" unless ALL CRITICAL and HIGH files have status DONE. A partial audit is still valuable — report what you found honestly. | ||
| 3. Do NOT claim "audit complete" or "full coverage achieved" unless ALL queued scannable source files have status DONE. A partial audit is still valuable — report what you found honestly. | ||
| 4. Autonomous runs must keep descending through the remaining priority queue after the current prioritized chunk is done: | ||
| - Finish current CRITICAL/HIGH work first. | ||
| - Immediately continue with remaining MEDIUM files. | ||
| - Then continue with remaining LOW files. | ||
| - Only stop when the queue is exhausted, the user interrupts, or a hard blocker prevents safe progress. | ||
| If zero bugs were confirmed, say so clearly — a clean report is a good result. | ||
| **Routing after report:** | ||
| - If there are confirmed security findings AND (`VALIDATE_SECURITY_MODE=true` OR `PR_SECURITY_MODE=true` OR `SECURITY_REVIEW_MODE=true`): | ||
| - Read `SKILL_DIR/skills/vulnerability-validation/SKILL.md`. | ||
| - Re-check reachability, exploitability, PoC quality, and CVSS details for the confirmed security findings before finalizing the security summary. | ||
| - If confirmed bugs > 0 AND `PLAN_ONLY_MODE=true`: | ||
| - Build `fix-strategy.json` and `fix-plan.json`. | ||
| - Present the strategy clusters (safe autofix vs manual review vs larger refactor vs architectural remediation). | ||
| - Stop before the Fixer edits code. | ||
| - If confirmed bugs > 0 AND `FIX_MODE=true`: | ||
| - Build and present `fix-strategy.json` first. | ||
| - Auto-fix only `ELIGIBLE` bugs. | ||
@@ -600,4 +662,9 @@ - Apply canary-first rollout: fix top critical eligible subset first, verify, then continue remaining eligible fixes. | ||
| Also write the final markdown report to `.bug-hunter/report.md` as the canonical human-readable output (in addition to displaying it to the user). | ||
| Also write the final markdown report to `.bug-hunter/report.md` as the | ||
| canonical human-readable output. Generate it from the JSON artifacts with: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/render-report.cjs" report ".bug-hunter/findings.json" ".bug-hunter/referee.json" > ".bug-hunter/report.md" | ||
| ``` | ||
| --- | ||
@@ -604,0 +671,0 @@ |
@@ -79,2 +79,4 @@ # Subagent Task Wrapper Template | ||
| **Artifact name for validation:** `{OUTPUT_ARTIFACT}` | ||
| Follow the output format specified in your system prompt EXACTLY. | ||
@@ -88,2 +90,7 @@ The orchestrator will read this file to pass your results to the next pipeline phase. | ||
| After writing the canonical artifact, validate it before you stop: | ||
| ```bash | ||
| node "{SKILL_DIR}/scripts/schema-validate.cjs" "{OUTPUT_ARTIFACT}" "{OUTPUT_FILE_PATH}" | ||
| ``` | ||
| ## Completion | ||
@@ -93,4 +100,5 @@ | ||
| 1. Write your report to `{OUTPUT_FILE_PATH}` | ||
| 2. Output a brief summary to stdout (one paragraph) | ||
| 3. Stop. Do not continue to other phases. | ||
| 2. Validate the artifact with `schema-validate.cjs` | ||
| 3. Output a brief summary to stdout (one paragraph) | ||
| 4. Stop. Do not continue to other phases. | ||
@@ -112,2 +120,3 @@ --- | ||
| | `{PHASE_SPECIFIC_CONTEXT}` | Extra context for this phase | For Skeptic: the Hunter findings. For Referee: findings + Skeptic challenges. | | ||
| | `{OUTPUT_FILE_PATH}` | Where to write the output | `.bug-hunter/findings.md` | | ||
| | `{OUTPUT_FILE_PATH}` | Where to write the canonical artifact | `.bug-hunter/findings.json` | | ||
| | `{OUTPUT_ARTIFACT}` | Artifact name passed to `schema-validate.cjs` | `findings`, `skeptic`, `referee`, `fix-report` | |
| # Shared Dispatch Patterns | ||
| This file defines how to dispatch each pipeline role (Recon, Hunter, Skeptic, Referee, Fixer) using any `AGENT_BACKEND`. Mode files reference this instead of duplicating dispatch boilerplate. | ||
| --- | ||
| ## Dispatch by Backend | ||
| ### local-sequential | ||
| You execute the role yourself: | ||
| 1. Read the prompt file: `read({ path: "$SKILL_DIR/prompts/<role>.md" })` | ||
| 2. If the role needs doc-lookup: also read `$SKILL_DIR/prompts/doc-lookup.md` | ||
| 3. **Switch mindset** to the role (important for Skeptic/Referee — genuinely adversarial) | ||
| 4. Execute the role's instructions using the Read tool to examine source files | ||
| 5. Write output to the role's output file (see Output Files table below) | ||
| ### subagent | ||
| 1. Read the prompt file: `read({ path: "$SKILL_DIR/prompts/<role>.md" })` | ||
| 2. Read the wrapper template: `read({ path: "$SKILL_DIR/templates/subagent-wrapper.md" })` | ||
| 3. Generate payload: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/payload-guard.cjs" generate <role> ".bug-hunter/payloads/<role>-<context>.json" | ||
| ``` | ||
| 4. Edit the payload JSON — fill in `skillDir`, `targetFiles`, and role-specific fields | ||
| 5. Validate: | ||
| ```bash | ||
| node "$SKILL_DIR/scripts/payload-guard.cjs" validate <role> ".bug-hunter/payloads/<role>-<context>.json" | ||
| ``` | ||
| 6. Fill the subagent-wrapper template variables: | ||
| - `{ROLE_NAME}` = role name (see table below) | ||
| - `{ROLE_DESCRIPTION}` = role description (see table below) | ||
| - `{PROMPT_CONTENT}` = full contents of the prompt .md file | ||
| - `{TARGET_DESCRIPTION}` = what is being scanned | ||
| - `{SKILL_DIR}` = absolute path to skill directory | ||
| - `{FILE_LIST}` = files in scan order (CRITICAL first) | ||
| - `{RISK_MAP}` = risk classification from triage or Recon | ||
| - `{TECH_STACK}` = framework, auth, DB from Recon | ||
| - `{PHASE_SPECIFIC_CONTEXT}` = role-specific context (see below) | ||
| - `{OUTPUT_FILE_PATH}` = output file path | ||
| 7. Dispatch: | ||
| ``` | ||
| subagent({ agent: "<role>-agent", task: "<filled template>", output: "<output-path>" }) | ||
| ``` | ||
| 8. Read the output file after completion | ||
| ### teams | ||
| Same as subagent, but dispatch with: | ||
| ``` | ||
| teams({ tasks: [{ text: "<filled template>" }], maxTeammates: 1 }) | ||
| ``` | ||
| ### interactive_shell | ||
| ``` | ||
| interactive_shell({ command: 'pi "<filled task prompt>"', mode: "dispatch" }) | ||
| ``` | ||
| --- | ||
| ## Role Reference | ||
| | Role | Prompt File | Role Description | Output File | Phase-Specific Context | | ||
| |------|-------------|-----------------|-------------|----------------------| | ||
| | `recon` | `prompts/recon.md` | Reconnaissance agent — map the codebase and classify files by risk | `.bug-hunter/recon.md` | Triage JSON path (if exists) | | ||
| | `hunter` | `prompts/hunter.md` | Bug Hunter — find behavioral bugs in source code | `.bug-hunter/findings.md` | `doc-lookup.md` + risk map + tech stack | | ||
| | `skeptic` | `prompts/skeptic.md` | Skeptic — adversarial review to disprove false positives | `.bug-hunter/skeptic.md` | Hunter findings (compact: bugId, severity, file, lines, claim, evidence, runtimeTrigger) + `doc-lookup.md` | | ||
| | `referee` | `prompts/referee.md` | Referee — impartial final judge of all findings | `.bug-hunter/referee.md` | Hunter findings + Skeptic challenges | | ||
| | `fixer` | `prompts/fixer.md` | Surgical code fixer — implement minimal fixes for confirmed bugs | `.bug-hunter/fix-report.md` | Confirmed bugs from Referee + tech stack + `doc-lookup.md` | | ||
| --- | ||
| ## Fixer Dispatch: Worktree Isolation (subagent/teams only) | ||
| When `WORKTREE_MODE=true`, the Fixer runs in a managed git worktree for isolation. The orchestrator handles the full lifecycle — the Fixer just edits and commits. | ||
| **Key differences from other role dispatches:** | ||
| 1. The worktree is created by the orchestrator via `worktree-harvest.cjs prepare` BEFORE dispatch. | ||
| 2. The Fixer's working directory is set to the worktree's absolute path, not the project root. | ||
| 3. The Fixer MUST `git add` + `git commit` each fix (uncommitted work = `FIX_FAILED`). | ||
| 4. The orchestrator harvests commits via `worktree-harvest.cjs harvest` AFTER dispatch. | ||
| 5. The orchestrator cleans up via `worktree-harvest.cjs cleanup` AFTER harvest. | ||
| **CRITICAL — do NOT use `isolation: "worktree"` on the Agent tool:** | ||
| The Agent tool's built-in worktree isolation creates an ephemeral branch and auto-cleans on exit, which loses Fixer commits. We manage worktrees ourselves so the Fixer commits land directly on the fix branch. | ||
| **Fixer-specific template variables for `{PHASE_SPECIFIC_CONTEXT}`:** | ||
| - `WORKTREE_DIR: <absolute path to worktree>` | ||
| - `FIX_BRANCH: <branch name>` | ||
| - `COMMIT_FORMAT: fix(bug-hunter): BUG-N — [description]` | ||
| - Worktree isolation rules (see `{WORKTREE_RULES}` in subagent-wrapper.md) | ||
| **Lifecycle diagram:** | ||
| ``` | ||
| Orchestrator Fixer (in worktree) | ||
| | | | ||
| |-- prepare (worktree-harvest.cjs) -->| | ||
| | |-- read code | ||
| | |-- edit files | ||
| | |-- git add + commit per bug | ||
| | |-- report done | ||
| |<-- harvest (worktree-harvest.cjs) --| | ||
| |-- cleanup (worktree-harvest.cjs) | | ||
| |-- verify on fix branch | | ||
| ``` | ||
| --- | ||
| ## Context Pruning Rules | ||
| When passing data between phases, include only what the receiving role needs: | ||
| **To Skeptic:** For each bug: BUG-ID, severity, file, lines, claim, evidence, runtimeTrigger, cross-references. Omit: Hunter's internal reasoning, scan coverage stats, FILES SCANNED/SKIPPED metadata. | ||
| **To Referee:** Full Hunter findings + full Skeptic challenges. The Referee needs both sides to judge. | ||
| **To Fixer:** For each confirmed bug: BUG-ID, severity, file, line range, description, suggested fix direction, tech stack context. Omit: Skeptic challenges, Referee reasoning. |
Sorry, the diff of this file is not supported yet
Network access
Supply chain riskThis module accesses the network.
Found 2 instances in 1 package
New author
Supply chain riskA new npm collaborator published a version of the package for the first time. New collaborators are usually benign additions to a project, but do indicate a change to the security surface area of a package.
Found 1 instance in 1 package
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Dynamic require
Supply chain riskDynamic require can indicate the package is performing dangerous or unsafe dynamic code execution.
Found 1 instance in 1 package
Environment variable access
Supply chain riskPackage accesses environment variables, which may be a sign of credential stuffing or data theft.
Found 5 instances in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
Long strings
Supply chain riskContains long string literals, which may be a sign of obfuscated or packed code.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
Network access
Supply chain riskThis module accesses the network.
Found 2 instances in 1 package
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Environment variable access
Supply chain riskPackage accesses environment variables, which may be a sign of credential stuffing or data theft.
Found 3 instances in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
Long strings
Supply chain riskContains long string literals, which may be a sign of obfuscated or packed code.
Found 1 instance in 1 package
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
Found 1 instance in 1 package
37961435
9471.5%90
76.47%9089
70.53%801
20.27%50
61.29%12
20%