@avcodes/mi
Advanced tools
+1
-1
@@ -8,3 +8,3 @@ #!/usr/bin/env node | ||
| */ | ||
| import { createInterface } from 'readline'; import { spawn } from 'child_process'; import { readFileSync, existsSync, readdirSync } from 'fs'; import { homedir } from 'os'; const DIR = new URL('.', import.meta.url).pathname; if (!process.env.OPENAI_API_KEY && !process.argv.includes('-h')) { console.error('OPENAI_API_KEY required'); process.exit(1); } | ||
| import { createInterface } from 'readline'; import { spawn } from 'child_process'; import { readFileSync, existsSync, readdirSync } from 'fs'; import { homedir } from 'os'; const DIR = new URL('.', import.meta.url).pathname; process.env.MI_PATH = new URL(import.meta.url).pathname; if (!process.env.OPENAI_API_KEY && !process.argv.includes('-h')) { console.error('OPENAI_API_KEY required'); process.exit(1); } | ||
@@ -11,0 +11,0 @@ /* Tools the agent can invoke. */ |
+1
-1
| { | ||
| "name": "@avcodes/mi", | ||
| "version": "1.3.0", | ||
| "version": "1.4.0", | ||
| "description": "agentic coding in 27 loc. a loop, two tools, and an llm.", | ||
@@ -5,0 +5,0 @@ "type": "module", |
| --- | ||
| name: debug | ||
| description: Structured root-cause investigation for bugs. Load when something is broken and the fix isn't obvious, to avoid patching symptoms. | ||
| description: Fix bugs, crashes, errors, or failing tests. Use when "it's broken", "getting an error", "test is failing", or the cause isn't obvious. | ||
| --- | ||
@@ -11,8 +11,16 @@ | ||
| - a failing test in the project's test suite. | ||
| The repro must fail deterministically. If it doesn't, shrink inputs and retry until it does. | ||
| The repro must fail deterministically. If it doesn't (flaky), shrink inputs, control concurrency, and fix the seed/clock. Do not proceed with a flaky repro — log it and ask the user for more constraints. | ||
| 2. **Observe.** Capture actual vs expected side-by-side. Collect stack traces, logs, and intermediate state (prints, `set -x`, debugger, logging). Write observations to `/tmp/mi-debug-notes.md` if the trail is long. Do not theorize yet. | ||
| 2. **Observe.** Capture actual vs expected side-by-side. Collect stack traces, logs, and intermediate state (`print`/`console.log`, `set -x`, a debugger, or structured logging). Write observations to `/tmp/mi-debug-notes.md` if the trail is long. Use this format: | ||
| 3. **Hypothesize.** State one explanation explicitly: "I believe X because Y." One at a time. If you have several, pick the cheapest to test first. | ||
| ``` | ||
| repro: python cli.py export --col foo | ||
| expected: CSV with column "foo" written to stdout | ||
| actual: ValueError: 'foo' not in index (traceback line 42 cli.py) | ||
| ``` | ||
| Do not theorize yet. Record only what you observe. | ||
| 3. **Hypothesize.** State one explanation explicitly: "I believe X because Y." One at a time. If you have several, pick the cheapest to test first. Write it down before you test it. | ||
| 4. **Test the hypothesis.** Change exactly one variable, re-run the repro, record the result. If the hypothesis is wrong, revert the change before trying the next one — do not stack speculative edits. | ||
@@ -24,2 +32,16 @@ | ||
| 6. **Add tests if requested.** If the original prompt included "add tests" or "write tests", the fix is confirmed — now call `skill("tdd")` and follow its body to structure the test-writing phase. Do not write tests inline without loading it. | ||
| **Writing or rewriting multi-line files.** Do not use `echo "...\n..."` (no real newlines without `-e`) and do not use `sed` to insert multi-line blocks (sed runs idempotency is hard to control and duplicates lines on retry). Use one of these instead: | ||
| - Heredoc (preferred): `cat > file.py <<'EOF'\n...\nEOF` | ||
| - Python write: `python3 -c "open('file.py','w').write('''...\n...\n''')"` | ||
| If the file already exists and you are patching one line, `sed -i 's/old/new/'` is fine. For anything involving indented blocks, write the whole file fresh. | ||
| **Red flags — stop and ask the user:** | ||
| - The repro requires production credentials, a live third-party service, or a database you cannot reset. | ||
| - More than three hypotheses have been tested and all were wrong (you are missing context). | ||
| - The bug disappears under observation (timing, logging, or sanitizer changes behavior). | ||
| - The failing code path is in a dependency you do not own; consider pinning/upgrading instead. | ||
| If you cannot reproduce after a reasonable effort, stop and say so. Request more information (exact command, environment, inputs, version). Do not guess-patch an unreproduced bug. |
| --- | ||
| name: delegate | ||
| description: Spawn an isolated `mi -p '<prompt>'` subprocess via bash for self-contained work that would otherwise bloat the main context. Load for exploration/research, independent parallel subtasks, or one-shot transformations that don't need iteration. | ||
| description: Run parallel or isolated subagents for independent subtasks (research, parallel analysis, one-shot transforms) to avoid bloating the main context. | ||
| --- | ||
@@ -17,6 +17,11 @@ | ||
| **Spawning subagents:** use `node "$MI_PATH"` — the harness sets `MI_PATH` automatically: | ||
| ``` | ||
| node "$MI_PATH" -p '<prompt>' | ||
| ``` | ||
| Sequential (one task, wait for result): | ||
| ``` | ||
| mi -p 'Read /abs/path/foo.py and list every function that touches the database. Print one per line as file:line name.' | ||
| node "$MI_PATH" -p 'Read /abs/path/foo.py and list every function that touches the database. Print one per line as file:line name.' | ||
| ``` | ||
@@ -29,8 +34,16 @@ | ||
| ``` | ||
| mi -p '<prompt A>' # with bg=truthy -> pid:A log:/tmp/mi-A.log | ||
| mi -p '<prompt B>' # with bg=truthy -> pid:B log:/tmp/mi-B.log | ||
| node "$MI_PATH" -p '<prompt A>' # with bg=truthy -> pid:A log:/tmp/mi-A.log | ||
| node "$MI_PATH" -p '<prompt B>' # with bg=truthy -> pid:B log:/tmp/mi-B.log | ||
| ``` | ||
| Collect each `pid` and `log`. Background children are detached (the harness calls `unref`) so `wait` will not find them — poll with `kill -0 <pid> 2>/dev/null` instead (exit 0 = still running, exit 1 = finished). Once done, `cat` each log to read the full transcript. Prefer telling each subprocess to also write a compact result file under `/tmp/mi-*` so you don't have to parse transcript noise. | ||
| Collect each `pid` and `log`. Background children are detached (the harness calls `unref`) so `wait` will not find them — poll with `kill -0 <pid> 2>/dev/null` instead (exit 0 = still running, exit 1 = finished). | ||
| Reading long logs: subagent logs grow large. Do NOT `cat` the full log blindly — instead: | ||
| - `tail -n 50 /tmp/mi-A.log` to see the final output and whether the agent concluded | ||
| - `grep -n "RESULT\|ERROR\|DONE\|Traceback" /tmp/mi-A.log` to surface key lines quickly | ||
| - `wc -l /tmp/mi-A.log` to know total size before committing to a full read | ||
| - If the agent wrote a compact result file (e.g. `/tmp/mi-A.out`), read that instead — it's why you ask for one | ||
| Always prefer telling each subprocess to write a compact result file under `/tmp/mi-*` so you don't have to parse transcript noise. | ||
| Keep prompts short and specific. A vague delegation wastes a whole subprocess. |
| --- | ||
| name: explore | ||
| description: Answer targeted questions about an unfamiliar codebase by fanning plausible code clusters out to subagents with citation-grade summaries. Load when the user asks a specific question about how/where code does something. | ||
| description: Answer "how does X work", "where is X defined", or "trace through Y" questions about a codebase using parallel subagent searches with cited summaries. | ||
| --- | ||
@@ -12,4 +12,16 @@ | ||
| 3. Spawn one subagent per cluster via `mi -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. Each prompt must include: the question verbatim, the absolute paths in scope, instruction NOT to recurse outside those paths, and the summary-file path. | ||
| 3. Spawn one subagent per cluster via `node "$MI_PATH" -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. Use this prompt template for each subagent: | ||
| ``` | ||
| Question: <the sharpened question, verbatim> | ||
| Scope: <absolute paths in this cluster only> | ||
| Do NOT read files outside the scope above. | ||
| Write your findings to /tmp/mi-explore-<cluster>.md using this exact format: | ||
| STATUS: complete | partial | blocked | ||
| SCOPE: <paths you actually read> | ||
| ANSWER: <1-3 sentences, no file:line refs here> | ||
| CITATIONS: <path/to/file.ext:<line> — <quoted excerpt>, one per line> | ||
| FOLLOW_UPS: <paths outside scope worth checking next, if any> | ||
| ``` | ||
| 4. Summary contract — every subagent writes `/tmp/mi-explore-<cluster>.md` with exactly these fields: | ||
@@ -22,4 +34,4 @@ - `STATUS:` `complete` | `partial` | `blocked` | ||
| 5. Poll with `kill -0 <pid> 2>/dev/null` (exit 0 = still running, 1 = done); do not `sleep`-poll and do not `wait`. Read each summary. Compose the final answer by stitching cluster-local `ANSWER`s together. The final output MUST include every `file:line` from every subagent's `CITATIONS` block — one per line, verbatim, in the output format the user requested. If you find yourself summarizing, deduping, or compressing `CITATIONS`, stop and list them explicitly. If every subagent returned `not found here`, either the question was mis-scoped or the cluster partition missed the right directory — say so plainly rather than invent an answer. | ||
| 5. While subagents run, draft FOLLOW_UPS you'll pursue if clusters return "not found here". Check completion with `kill -0 <pid> 2>/dev/null` (exit 0 = still running, exit 1 = done) — do NOT use `wait` (detached children are unreachable by `wait`). Once a pid exits, `cat /tmp/mi-explore-<cluster>.md` to read the result. Compose the final answer by stitching cluster-local `ANSWER`s together. The final output MUST include every `file:line` from every subagent's `CITATIONS` block — one per line, verbatim. If you find yourself summarizing or compressing `CITATIONS`, stop and list them explicitly instead. If every subagent returned `not found here`, say so plainly — either the question was mis-scoped or the cluster partition missed the right directory. | ||
| On a single-GPU local endpoint subagents serialize at the model server — this pattern saves context, not wall-clock. On a hosted endpoint the fan-out is genuinely parallel. |
+17
-3
| --- | ||
| name: plan | ||
| description: Record a short strategy doc at /tmp/mi-<slug>/plan.md before non-trivial work. Load when a task needs more than one step, spans multiple files, or has unclear direction. | ||
| description: Write a strategy doc before starting multi-step, multi-file, or unclear work. Use when asked to "plan out", "figure out the approach", or before any task needing more than one step. | ||
| --- | ||
@@ -8,4 +8,18 @@ | ||
| Pick a short kebab-case `<slug>` for the task (e.g. `auth-refactor`, `fix-retry-bug`) and reuse the same slug for any execution-state list under `/tmp/mi-<slug>/tasks.md` so plan and tasks move together. Create the dir once with `mkdir -p /tmp/mi-<slug>`. If a plan already exists for the task, reuse the same slug rather than starting a new one (`ls -d /tmp/mi-*/ 2>/dev/null` to check). | ||
| Pick a short kebab-case `<slug>` for the task (e.g. `auth-refactor`, `fix-retry-bug`) and reuse the same slug for any execution-state list under `/tmp/mi-<slug>/tasks.md` so plan and tasks move together. If a plan already exists for the task, reuse the same slug rather than starting a new one (`ls -d /tmp/mi-*/ 2>/dev/null` to check). | ||
| Create the directory and write the plan file with one command: | ||
| ``` | ||
| mkdir -p /tmp/mi-<slug> && cat > /tmp/mi-<slug>/plan.md <<'EOF' | ||
| # Goal | ||
| ... | ||
| # Approach | ||
| - ... | ||
| # Risks / Open Questions | ||
| - ... | ||
| EOF | ||
| ``` | ||
| **Slug collisions:** When multiple concurrent mi sessions may run simultaneously (e.g. parallel subagents each starting their own plan), append a timestamp or random suffix to avoid clobbering each other's files: `auth-refactor-$(date +%s)`. Single-session sequential work does not need this. | ||
| Write `/tmp/mi-<slug>/plan.md` with three sections, nothing else: | ||
@@ -26,3 +40,3 @@ | ||
| This is strategy, not a task list. No checkboxes, no status fields — execution state belongs in `/tmp/mi-<slug>/tasks.md`, not here. | ||
| This is strategy, not a task list. No checkboxes, no status fields — execution state belongs in `/tmp/mi-<slug>/tasks.md`, not here. To track step-by-step progress alongside this plan, call `skill("tasks")` and follow its body. | ||
@@ -29,0 +43,0 @@ Re-read `/tmp/mi-<slug>/plan.md` before each major step. If reality diverges from the plan, revise the file before continuing — do not let it rot. |
| --- | ||
| name: refactor | ||
| description: Restructure code without changing behavior, with a subagent-run callsite sweep and test gate between every step. Load when renaming, extracting, moving, or splitting code. | ||
| description: Restructure code without changing behavior. Triggers: "rename X to Y", "move X to Y", "extract X into Y", "clean up / split / deduplicate code" — one transformation at a time, tests kept green. | ||
| --- | ||
@@ -8,9 +8,21 @@ | ||
| 1. Green gate. Before step 2, call `skill("verify")` and follow its body to run the project's tests. Do not proceed without loading it. If red, stop — refactoring on a broken suite is untrackable, you cannot tell whether your change introduced failures. Ask the user to fix or accept current red before continuing. | ||
| ## Scope rules — read before touching anything | ||
| 2. Name the transformation in one sentence: "rename `foo` → `bar`", "extract `X` from `file_a.py` into `file_b.py`", "split `big_module.py` into `a.py` and `b.py` by concern". ONE transformation per pass. Never combine rename + extract + move in a single step. | ||
| - **One transformation per pass.** Never combine rename + extract + move in a single step. | ||
| - **No opportunistic fixes.** If you see a bug, a missing test, or a style inconsistency while refactoring, write it to a TODO file and continue. Fix it in a separate commit after the refactor is committed. | ||
| - **No incidental reformatting.** Do not change indentation, trailing commas, import order, or quote style unless the transformation requires it. Noise in the diff obscures real changes and breaks reviewers. | ||
| - **Feature creep is a failure mode.** If the transformation turns into "while I'm here I'll also…" — stop. Complete the stated transformation and nothing else. | ||
| 3. For any rename / move / signature change, delegate a callsite sweep before touching code. Spawn a subagent via `mi -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. The prompt must include: the symbol, the repo root, and instruction to list every hit with context. | ||
| ## Steps | ||
| 4. Summary contract — the callsite subagent writes `/tmp/mi-refactor-callsites-<symbol>.md` with: | ||
| 1. **Green gate.** Call `skill("verify")` and follow its body to run the project's tests. Do not proceed without loading it. | ||
| - If red: stop — refactoring on a broken suite is untrackable; you cannot tell whether your change introduced failures. Ask the user to fix or accept the current red baseline before continuing. | ||
| - **If no tests exist:** note this explicitly. Proceed with extra caution — you have no automated safety net. Limit each transformation to the smallest possible diff, and manually verify the observable behavior (run the program, call the function, check the output) before committing. | ||
| 2. **Name the transformation.** One sentence: "rename `foo` → `bar`", "extract `parseDate` from `utils.py` into `date_utils.py`", "split `big_module.py` into `reader.py` and `writer.py` by concern". Write it down; this sentence becomes your commit message subject. | ||
| 3. **Callsite sweep** (required for any rename / move / signature change). Spawn a subagent via `node "$MI_PATH" -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. The prompt must include: the symbol, the repo root, and instruction to list every hit with context. | ||
| 4. **Summary contract** — the callsite subagent writes `/tmp/mi-refactor-callsites-<symbol>.md` with: | ||
| - `STATUS:` `complete` | `partial` | `blocked` | ||
@@ -20,6 +32,24 @@ - `HITS:` `path/to/file.ext:<line>: <surrounding code excerpt>` per occurrence | ||
| 5. Apply the transformation across every non-ambiguous hit in one pass. Re-run the test suite (reuse the commands from the `verify` load in step 1). If red: `git checkout -- .` (or equivalent revert) — do NOT patch forward. A refactor that needs a fix is a failed refactor; start over with a smaller transformation. | ||
| 5. **Apply.** Transform every non-ambiguous hit in one pass. Re-run the test suite (reuse the commands from the `verify` load in step 1). If red: `git checkout -- .` (or equivalent revert) — do NOT patch forward. A refactor that needs a fix is a failed refactor; start over with a smaller transformation. | ||
| 6. Commit (or mark the pass complete) before starting the next transformation. If you notice a bug during a refactor, write it to a TODO list and finish the refactor first. Never ship behavior changes and structural changes in the same commit. | ||
| 6. **Commit.** Use the sentence from step 2 as the commit message subject. Mark the pass complete before starting the next transformation. Never ship behavior changes and structural changes in the same commit. | ||
| Poll the callsite subagent with `kill -0 <pid> 2>/dev/null` (exit 0 = still running, 1 = done); do not `sleep`-loop and do not `wait`. Confirm `STATUS: complete` before consuming `HITS` — a `partial` callsite list leads to half-applied renames. | ||
| ## Definition of done | ||
| A refactor pass is done when all of the following hold: | ||
| - The transformation named in step 2 is fully applied (no half-renamed callsites, no orphaned imports). | ||
| - The test suite (or manual verification if no tests) is green. | ||
| - The diff contains no behavioral changes — no logic added, no defaults changed, no error handling altered. | ||
| - The commit is made and contains only the structural change. | ||
| If any condition is not met, the pass is not done. | ||
| ## Red Flags — stop and ask the user | ||
| - The green gate (step 1) is red and the user has not explicitly accepted the baseline — do not refactor on broken code. | ||
| - The callsite subagent returns `STATUS: partial` — you have an incomplete hit list; do not apply the rename until you have a complete sweep. | ||
| - After applying the transformation the diff touches more than ~50 lines of logic (not counting moved/renamed identifiers) — the scope has likely crept; revert and re-scope. | ||
| - A test that was passing before the refactor now fails in a way that looks like a logic change, not a rename miss — the transformation altered behavior; revert and investigate before proceeding. |
| --- | ||
| name: review | ||
| description: Produce a pragmatic code review by partitioning a diff across subagents, one per concern, aggregating file-anchored findings. Load when asked to review a PR, branch, or staged diff. | ||
| description: Review code, a PR, branch, or staged diff. Use when asked to "review", "give feedback", "check this PR", or "what's wrong with this diff". | ||
| --- | ||
@@ -14,3 +14,3 @@ | ||
| 4. Spawn one subagent per partition via `mi -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. Each prompt must include: the intent (one line), the partition's diff path, and exactly these four axes to check: correctness, scope creep, test coverage, convention fit with surrounding code. | ||
| 4. Spawn one subagent per partition via `node "$MI_PATH" -p '<prompt>'` with `bg=truthy` — the harness returns `pid:X log:/tmp/mi-X.log` and detaches the child; do NOT append `&`. Each prompt must include: the intent (one line), the partition's diff path, and exactly these four axes to check: correctness, scope creep, test coverage, convention fit with surrounding code. | ||
@@ -22,4 +22,18 @@ 5. Summary contract — each subagent writes `/tmp/mi-review-<slug>.md` with: | ||
| Example of a well-formed FINDINGS section (three entries, three different axes): | ||
| ``` | ||
| STATUS: complete | ||
| FINDINGS: | ||
| - src/auth/login.py:84 — correctness — `user.id` can be None when OAuth flow skips email verification; downstream callers assume it's always set | ||
| - src/auth/login.py:91 — test coverage — happy-path OAuth login has no test; the only test in test_login.py covers password auth only | ||
| - src/auth/middleware.py:12 — convention fit — project uses `logger = logging.getLogger(__name__)` everywhere except here, which calls `print()` directly | ||
| UNRESOLVED: | ||
| - Is the `None` user.id case reachable in production? Depends on provider config not visible in this slice. | ||
| ``` | ||
| Mimic this format exactly — one bullet per finding, colon-separated triple, no hedging prose. | ||
| 6. Aggregate. Dedupe overlapping findings across partitions. Group the output by severity or by file (ask the user if unclear). If every subagent left an axis empty across the whole diff — e.g. nobody flagged missing tests — call that out as a coverage gap, not a clean bill. | ||
| **Handling missing or blocked subagents:** Before consuming a result file, check that it exists and that `STATUS:` is `complete` or `partial`. If the file is missing or `STATUS: blocked`, do NOT silently skip it — insert a line in the aggregate output: `BLOCKED: <subagent-slug> (no output / status: blocked)`. Continue aggregating remaining partitions normally. Surface blocked partitions to the user at the end so they know that slice was not reviewed. | ||
| Poll subagents with `kill -0 <pid> 2>/dev/null` (exit 0 = still running, 1 = done); do not `sleep`-poll and do not `wait`. Read each `/tmp/mi-review-<slug>.md` by grepping `^STATUS:` first to confirm completion before consuming findings. |
| --- | ||
| name: tasks | ||
| description: Track execution state for multi-step work as a checkbox list at /tmp/mi-<slug>/tasks.md. Load when work has more than one step; skip for single-step jobs. State tracking only — use `plan` for strategy. | ||
| description: Track a numbered list of steps to completion. Use when the user gives explicit numbered steps ("1) do X 2) do Y"), says "I need to do N things", or when a multi-step job needs a checklist. Distinct from `plan` (strategy): tasks tracks what's done/pending during execution. | ||
| --- | ||
@@ -8,4 +8,6 @@ | ||
| Reuse the `<slug>` from any existing `/tmp/mi-<slug>/plan.md` so plan and tasks share `/tmp/mi-<slug>/`. If none exists, pick a kebab-case slug and `mkdir -p /tmp/mi-<slug>` first. | ||
| Reuse the `<slug>` from any existing `/tmp/mi-<slug>/plan.md` so plan and tasks share `/tmp/mi-<slug>/`. If none exists, pick a kebab-case slug and `mkdir -p /tmp/mi-<slug>` first. If there is no plan yet and the work has unclear direction, call `skill("plan")` first to record strategy before tracking steps. | ||
| **Slug collisions:** If multiple concurrent mi sessions may run simultaneously (e.g. parallel subagents each starting fresh work), a plain descriptive slug like `fix-auth` risks collision. Append a short timestamp or random suffix: `fix-auth-$(date +%s)` or `fix-auth-$(head -c4 /dev/urandom | xxd -p)`. Single-agent sequential work does not need this. | ||
| Maintain `/tmp/mi-<slug>/tasks.md` as a flat checkbox list: | ||
@@ -19,5 +21,14 @@ | ||
| **Work loop — repeat until all tasks are `[x]`:** | ||
| 1. Pick the next `- [ ]` task. | ||
| 2. Rewrite the file with that task as `- [~]` (in-progress). | ||
| 3. **Actually do the task** — run commands, edit files, verify the result. | ||
| 4. Only after the work is confirmed done, rewrite the file with that task as `- [x]`. | ||
| 5. Move to the next `- [ ]` task. | ||
| Do not mark a task `- [x]` before doing the work. Do not batch-mark multiple tasks at once. | ||
| Rules: | ||
| - At most ONE `- [~]` at any time. | ||
| - Flip a task to `- [~]` BEFORE starting it. Flip to `- [x]` IMMEDIATELY after finishing it. Never batch updates across multiple tasks. | ||
| - Each task is one concrete, verifiable step. If a task needs sub-steps, either rewrite it or split it in place. | ||
@@ -24,0 +35,0 @@ - When scope grows, append new `- [ ]` entries. Do not silently drop work. |
+26
-9
| --- | ||
| name: tdd | ||
| description: Red/green/refactor loop for adding new behavior. Load when extending code that already has a test harness; one behavior at a time. | ||
| description: Add new behavior test-first (write failing test → make it pass → refactor). Use when asked to "add a test for", "implement X with tests", or extending a codebase that already has a test suite. | ||
| --- | ||
@@ -8,14 +8,18 @@ | ||
| **Do not use TDD for:** exploratory spikes, one-off scripts, or behavior that is purely UI layout. Also do not use it when the test harness would require more effort to bootstrap than the feature itself — ask the user instead. | ||
| For each new behavior, run the loop below exactly once. Do not queue up multiple failing tests. | ||
| 1. Red — write one failing test | ||
| - Pick the smallest next behavior. Name the test after the behavior, not the implementation. | ||
| 1. **Red — write one failing test** | ||
| - Pick the smallest next behavior. "Smallest" means: one function, one branch, one edge case. Not "the whole feature." | ||
| - Example: if adding CSV export, first test is `test_export_returns_bytes`, not `test_export_all_columns_with_header_and_quoting`. | ||
| - Name the test after the behavior, not the implementation: `test_negative_price_is_rejected`, not `test_validate_price_calls_check_sign`. | ||
| - Place it beside sibling tests; match their style and imports. | ||
| - Assert on observable output, not internals. | ||
| - Assert on observable output (return value, file written, exception raised, stdout), not on internal state (private fields, method call counts). | ||
| 2. Red — confirm it fails for the right reason | ||
| 2. **Red — confirm it fails for the right reason** | ||
| - Run only the new test (e.g. `pytest path::test_name`, `vitest run -t 'name'`, `cargo test name`). | ||
| - The failure must be an assertion mismatch. If it is `ImportError`, `SyntaxError`, `ModuleNotFoundError`, or a typo — fix the test itself, then re-run. Do not proceed until the failure is a real assertion. | ||
| - The failure must be an assertion mismatch or `NotImplementedError`. If it is `ImportError`, `SyntaxError`, `ModuleNotFoundError`, or a typo — fix the test itself, then re-run. Do not proceed until the failure is a real assertion. | ||
| 3. Green — minimum code to pass | ||
| 3. **Green — minimum code to pass** | ||
| - Hardcode if that is genuinely the simplest thing; generality comes from the next failing test, not speculation. | ||
@@ -25,10 +29,23 @@ - Touch only files required to satisfy this test. | ||
| 4. Green — run the relevant subset | ||
| 4. **Green — run the relevant subset** | ||
| - Run the whole file or module under test, not just the one case. Confirm green. | ||
| - If anything else went red, you broke something — fix or revert before moving on. | ||
| 5. Refactor | ||
| 5. **Refactor** | ||
| - Rename, dedupe, extract — structural changes only, no new behavior. | ||
| - Re-run the same subset. Still green, or revert the refactor. | ||
| - For significant structural changes (rename across many callsites, extract to a new module, split a large class), call `skill("refactor")` and follow its body — it handles callsite sweeps and the green gate properly. | ||
| **Writing or rewriting multi-line files.** Do not use `echo "...\n..."` (no real newlines without `-e`) and do not use `sed` to insert multi-line blocks (sed idempotency is hard to control and duplicates lines on retry). Use one of these instead: | ||
| - Heredoc (preferred): `cat > file.py <<'EOF'\n...\nEOF` | ||
| - Python write: `python3 -c "open('file.py','w').write('''...\n...\n''')"` | ||
| If the file already exists and you are patching one line, `sed -i 's/old/new/'` is fine. For anything involving indented blocks, write the whole file fresh. | ||
| **Common anti-patterns to avoid:** | ||
| - Writing multiple failing tests before any green (defeats the feedback loop). | ||
| - Asserting on internal state or mocks when the public contract is testable. | ||
| - Skipping step 2 and discovering the test never actually ran. | ||
| - Refactoring production code and tests simultaneously in step 5. | ||
| Then stop and return to the caller, or start the loop again for the next behavior. Before declaring the larger task done, call `skill("verify")` and follow its body. Do not declare done without loading it. |
| --- | ||
| name: verify | ||
| description: Run the project's lint/typecheck/test/build after code changes; never declare work done on red output. | ||
| description: Run lint, typecheck, tests, or build — and fix any failures found. Use when asked "does it build", "run the tests", "check for lint errors", "find and fix lint errors", or after making code changes. | ||
| --- | ||
@@ -18,2 +18,22 @@ | ||
| ## Reading tool output | ||
| Common output patterns — match on these, not on exit code alone (some tools exit 0 even with warnings): | ||
| - **ESLint / Biome / Oxc:** lines like `path/to/file.js:12:5: error: <rule-id> — <message>`. Exit 1 = at least one `error`; `warning` lines are non-blocking unless the project sets `--max-warnings 0`. | ||
| - **TypeScript (`tsc --noEmit`):** `src/foo.ts(34,7): error TS2345: Argument of type 'string' is not assignable…`. Any `error TS` line is blocking; `TS2304` (cannot find name) often means a missing import, not a type error per se. | ||
| - **mypy / pyright / pyrefly:** `file.py:10: error: Incompatible return value type`. `error:` = blocking; `note:` = informational. A clean run ends with `Success: no issues found` or `0 errors`. | ||
| - **pytest:** `FAILED tests/test_foo.py::test_bar — AssertionError`. Summary line `X failed, Y passed` tells you scope. A `ERROR` (capital) means a collection/fixture error, not a test failure — different fix. | ||
| - **cargo check / clippy:** `error[E0308]: mismatched types` = blocking. `warning: unused variable` = non-blocking unless the project forbids warnings (`#![deny(warnings)]` or `RUSTFLAGS=-D warnings`). | ||
| - **go vet / golangci-lint:** `path/file.go:12:5: <linter>: <message>`. Exit 1 on any finding. | ||
| ## Partial failures | ||
| When a check emits both errors and warnings: | ||
| - **Errors only:** fix all before continuing — they are always blocking. | ||
| - **Warnings only:** check whether the project treats warnings as errors (e.g., `eslint --max-warnings 0`, `pytest -W error`, `RUSTFLAGS=-D warnings`). If yes, fix them. If no, note them but do not block. | ||
| - **Errors + warnings:** fix the errors first. Re-run to confirm errors are gone; then apply the warnings rule above. | ||
| - **Pre-existing red on untouched code:** do not silently skip. Say explicitly: "These N failures existed before my changes: [list]. They are not caused by this edit." If you cannot confirm pre-existing status via `git stash && <check> && git stash pop`, say so. | ||
| On red: | ||
@@ -24,1 +44,8 @@ - Do NOT report the task as done. Read the failing output, fix the underlying cause, re-run the same command, then re-run earlier stages to confirm no regression. | ||
| If after checking the three sources above you find no verification commands, tell the user plainly that the project has none configured. Do not scaffold one and do not fall back to guessed defaults. | ||
| ## Red Flags — stop and ask the user | ||
| - The check command itself errors out before running (missing tool, wrong Node/Python version, misconfigured env) — do not guess a workaround; report the setup gap. | ||
| - Test suite takes more than 5 minutes with no clear slow-test cause — wrap with `timeout`, report, and ask whether to proceed. | ||
| - A linter rule silences itself (e.g., `// eslint-disable-next-line`, `# noqa`, `#[allow(…)]`) inside code you just wrote — that is suppression, not a fix; remove the suppression and fix the root cause instead. | ||
| - Exit code and output disagree (exit 0 but output contains `error:` lines, or exit 1 but no error text found) — the tool may be misconfigured; report verbatim and do not assume green. |
Network access
Supply chain riskThis module accesses the network.
Found 1 instance in 1 package
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Environment variable access
Supply chain riskPackage accesses environment variables, which may be a sign of credential stuffing or data theft.
Found 6 instances in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
Long strings
Supply chain riskContains long string literals, which may be a sign of obfuscated or packed code.
Found 1 instance in 1 package
Network access
Supply chain riskThis module accesses the network.
Found 1 instance in 1 package
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Environment variable access
Supply chain riskPackage accesses environment variables, which may be a sign of credential stuffing or data theft.
Found 5 instances in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
Long strings
Supply chain riskContains long string literals, which may be a sign of obfuscated or packed code.
Found 1 instance in 1 package
46339
35.32%10
11.11%