agent-powerups
Advanced tools
| --- | ||
| name: agent-readable-docs | ||
| description: Use when writing technical documentation that needs to be readable by both humans and AI models, converting existing docs to HADS format, validating a HADS document, or optimizing documentation for token-efficient AI consumption. | ||
| --- | ||
| # Human-AI Document Standard (HADS) | ||
| --- | ||
| ## AI READING INSTRUCTION | ||
| This skill teaches the agent how to read, generate, and validate HADS documents. | ||
| Read all `[SPEC]` blocks before responding to any HADS-related request. | ||
| Read `[NOTE]` blocks if you need context on intent or edge cases. | ||
| --- | ||
| ## 1. WHAT IS HADS | ||
| **[SPEC]** | ||
| - HADS = Human-AI Document Standard | ||
| - Convention for Markdown technical documentation | ||
| - Four block types: `**[SPEC]**`, `**[NOTE]**`, `**[BUG]**`, `**[?]**` | ||
| - Every HADS document requires: H1 title, version declaration, AI manifest | ||
| - AI manifest appears before first content section, tells AI what to read/skip | ||
| - File extension: `.md` — standard Markdown, no tooling required | ||
| --- | ||
| ## 2. BLOCK TYPES | ||
| **[SPEC]** | ||
| ``` | ||
| **[SPEC]** Authoritative fact. Terse. Bullet lists, tables, code. AI reads always. | ||
| **[NOTE]** Human context, history, examples. AI may skip. | ||
| **[BUG]** Verified failure + fix. Required fields: symptom, cause, fix. Always read. | ||
| **[?]** Unverified / inferred. Lower confidence. Always flagged. | ||
| ``` | ||
| Block tag rules: | ||
| - Bold, on its own line: `**[SPEC]**` | ||
| - Content follows immediately (no blank line between tag and content) | ||
| - Multiple blocks of different types allowed per section | ||
| - Titled BUG blocks allowed: `**[BUG] Short description**` | ||
| - No nesting of blocks inside blocks | ||
| --- | ||
| ## 3. REQUIRED DOCUMENT STRUCTURE | ||
| **[SPEC]** | ||
| ```markdown | ||
| # Document Title | ||
| **Version X.Y.Z** · Author · Date · [metadata] | ||
| --- | ||
| ## AI READING INSTRUCTION | ||
| Read `[SPEC]` and `[BUG]` blocks for authoritative facts. | ||
| Read `[NOTE]` only if additional context is needed. | ||
| `[?]` blocks are unverified — treat with lower confidence. | ||
| --- | ||
| ## 1. First Section | ||
| **[SPEC]** | ||
| ... | ||
| ``` | ||
| Required elements in order: | ||
| 1. H1 title | ||
| 2. Version block in header | ||
| 3. AI manifest section before first content section | ||
| 4. Content sections (H2), subsections (H3) | ||
| --- | ||
| ## 4. HOW AI READS HADS | ||
| **[SPEC]** | ||
| When encountering a HADS document: | ||
| 1. Find and read the AI manifest first | ||
| 2. Read all `[SPEC]` blocks — these are ground truth | ||
| 3. Read all `[BUG]` blocks — always, before generating any code or config | ||
| 4. Read `[NOTE]` blocks only if `[SPEC]` is insufficient to answer the query | ||
| 5. Treat `[?]` content as hypothesis — note uncertainty in response | ||
| Token optimization: for large documents, scan section headings first, then read only `[SPEC]` and `[BUG]` blocks in relevant sections. | ||
| --- | ||
| ## 5. HOW TO GENERATE HADS | ||
| **[SPEC]** | ||
| When asked to write documentation in HADS format: | ||
| 1. Start with header block (title, version, metadata) | ||
| 2. Add AI manifest — always include, never skip | ||
| 3. Organize content into numbered H2 sections | ||
| 4. For each fact: write as `[SPEC]` — terse, bullet or table or code | ||
| 5. For each "why" or context: write as `[NOTE]` | ||
| 6. For each known failure mode with confirmed fix: write as `[BUG]` | ||
| 7. For each unverified claim: write as `[?]` | ||
| 8. End with changelog section | ||
| Content rules for `[SPEC]`: | ||
| - Prefer bullet lists over prose | ||
| - Prefer tables for multi-field facts | ||
| - Prefer code blocks for syntax, formats, examples | ||
| - Maximum 2 sentences of prose — if more needed, move to `[NOTE]` | ||
| Content rules for `[BUG]`: | ||
| - Always include: symptom, cause, fix | ||
| - Optional: affected versions, workaround | ||
| - Title on same line: `**[BUG] Short description**` | ||
| **[NOTE]** | ||
| When converting existing documentation to HADS: extract facts into `[SPEC]`, move narrative and history to `[NOTE]`, surface all known issues as `[BUG]`. Do not duplicate content between block types. | ||
| --- | ||
| ## 6. VALIDATION RULES | ||
| **[SPEC]** | ||
| A valid HADS document must have: | ||
| - H1 title | ||
| - Version in header | ||
| - AI manifest before first content section | ||
| - All block tags bold | ||
| - `[BUG]` blocks contain at minimum symptom + fix | ||
| --- | ||
| ## 7. DESIGN INTENT | ||
| **[NOTE]** | ||
| HADS exists because AI models increasingly read documentation before humans do. The format optimizes for this reality without sacrificing human readability. | ||
| Key insight: the AI manifest is the core innovation. It lets the model know what to read and what to skip — without requiring it to reason about document structure. Explicit is better than implicit for model consumption. |
| --- | ||
| name: ai-regression-testing | ||
| description: Deterministic checks first, agent review second, regression test for every real bug fixed or document why not. Targets the blind spot where an agent writes and reviews its own code. | ||
| --- | ||
| # AI Regression Testing | ||
| When an agent writes code and then reviews it, it carries the same assumptions into both steps. Automated tests break this cycle. | ||
| ## When to Use | ||
| - An agent has modified logic, API routes, or data transformation code | ||
| - A bug was found — need to prevent re-introduction | ||
| - Running `/bug-check` after a change session | ||
| - Multiple execution paths exist (feature flags, sandbox vs production, env variants) | ||
| ## The Core Problem | ||
| ``` | ||
| Agent writes fix → Agent reviews fix → Agent says "looks correct" → Bug still present | ||
| ``` | ||
| The most common blind spot: an agent fixes the production path but leaves the sandbox/mock path unchanged, or vice versa. | ||
| ## Workflow | ||
| Run in order. Do not skip to agent review if automated steps fail. | ||
| ### Step 1 — Run Tests (mandatory) | ||
| ```bash | ||
| npm test # or: pytest, cargo test, go test ./... | ||
| npm run build # TypeScript build / type check | ||
| ``` | ||
| - **Test fail** → highest priority; fix before anything else | ||
| - **Build fail** → report type errors as highest priority | ||
| - **Both pass** → continue to Step 2 | ||
| ### Step 2 — Agent Code Review | ||
| With tests passing, do a focused review for patterns agents commonly miss: | ||
| 1. **Execution path parity**: Do all code paths (sandbox, production, feature-flag on/off) return the same response shape? | ||
| 2. **Query completeness**: Are all fields used in the response present in the query or selection? | ||
| 3. **Error state cleanup**: On error, is stale state cleared before the error is surfaced? | ||
| 4. **Optimistic update rollback**: If an API call fails, is the optimistic UI change reverted? | ||
| ### Step 3 — Write a Regression Test for Each Bug Fixed | ||
| For every bug found and fixed, add a test immediately: | ||
| ``` | ||
| Bug: <description> | ||
| File: <path> | ||
| Regression test: <test name and what it asserts> | ||
| ``` | ||
| If you cannot write a test, document why: | ||
| ``` | ||
| Bug: <description> | ||
| Regression test: DEFERRED — <reason> (e.g., requires E2E harness not yet in place) | ||
| ``` | ||
| Do not silently skip. Every real bug should either have a test or an explicit deferral note. | ||
| ## Writing Effective Regression Tests | ||
| Test the contract, not the implementation: | ||
| ```typescript | ||
| // Test what the consumer receives, not how it's computed | ||
| const REQUIRED_RESPONSE_FIELDS = ["id", "email", "settings", "created_at"]; | ||
| it("profile endpoint returns all required fields", async () => { | ||
| const res = await GET(createRequest("/api/user/profile")); | ||
| const json = await res.json(); | ||
| for (const field of REQUIRED_RESPONSE_FIELDS) { | ||
| expect(json.data).toHaveProperty(field); | ||
| } | ||
| }); | ||
| ``` | ||
| Name tests after the bug category, not the fix: | ||
| ```typescript | ||
| it("sandbox path returns same field set as production path (BUG-CLASS: path-parity)") | ||
| it("notification_settings is not undefined after SELECT * removal (regression)") | ||
| ``` | ||
| ## Common AI Regression Patterns | ||
| | Pattern | Check | Priority | | ||
| |---------|-------|----------| | ||
| | Execution path parity | Same response shape across all paths | High | | ||
| | Query field omission | All response fields present in DB query | High | | ||
| | Error state leakage | State cleared before error is returned | Medium | | ||
| | Missing rollback | Previous state restored on API failure | Medium | | ||
| ## Strategy | ||
| Do not aim for coverage percentage. Write tests only for bugs that were found. Bug clusters naturally: if three bugs appeared in `/api/user/profile`, that endpoint needs tests. An endpoint that has never had a bug does not need tests yet. | ||
| Tests added this way grow organically with the bug history and cannot be gamed by coverage metrics. |
| --- | ||
| name: api-doc-review | ||
| description: "Verify that API endpoints match their OpenAPI/Swagger specifications." | ||
| --- | ||
| # API Doc Review | ||
| Outdated API documentation causes integration failures. The code is the source of truth, and the docs must match. | ||
| ## Review Protocol | ||
| 1. Compare the route definition (e.g., `POST /users`) with the documented endpoint. | ||
| 2. Verify that all required request parameters (body, query, params) are documented with correct types. | ||
| 3. Verify that all possible response status codes (200, 400, 404, 500) and their payloads match the actual error handlers and return statements. | ||
| 4. If there is a mismatch, update the OpenAPI spec or inline documentation immediately. Do not defer it. |
| --- | ||
| name: architecture-decision-records | ||
| description: "Record why an architectural choice was made to prevent agents or humans from unintentionally reverting it." | ||
| --- | ||
| # Architecture Decision Records (ADR) | ||
| Code tells you *how* a system works. ADRs tell you *why* it works that way, preventing future maintainers (and AI agents) from suggesting "improvements" that were already tried and discarded. | ||
| ## ADR Protocol | ||
| When finalizing a major design decision (e.g., "Choosing Postgres over MongoDB", "Using custom event bus over Redis"): | ||
| 1. Create `docs/adr/YYYY-MM-DD-<short-title>.md`. | ||
| 2. Include the **Context** (what is the problem?). | ||
| 3. Include the **Decision** (what are we doing?). | ||
| 4. Include the **Consequences** (what trade-offs are we accepting?). | ||
| 5. Keep it under 300 words. Focus on constraints, not theory. |
| --- | ||
| name: architecture-simplification | ||
| description: "Use to collapse over-engineered abstractions, remove unnecessary layers, or consolidate redundant logic." | ||
| --- | ||
| # Architecture Simplification | ||
| Over time, codebases accumulate "just in case" abstractions. This skill guides the safe removal of unnecessary complexity. | ||
| ## Simplification Rules | ||
| 1. **Identify the Abstraction Cost**: Does this interface have only one implementation? Does this wrapper class just pass arguments straight through? | ||
| 2. **Inline the Logic**: Move the logic from the unnecessary abstraction directly into the caller. | ||
| 3. **Delete the Dead Code**: Remove the interface, wrapper, or factory that is no longer needed. | ||
| 4. **Test Verification**: Ensure the observable behavior of the system has not changed. | ||
| ## Anti-Pattern | ||
| Do not rewrite the entire subsystem. Simplification means removing the noise around the core logic, not changing the core logic itself. | ||
| **Example**: | ||
| If a `UserRepository` implements `IUserRepository` but there is only ever one database, inline `UserRepository` and delete `IUserRepository`. |
| --- | ||
| name: baseline-comparison-review | ||
| description: "Ensure that new complex models actually outperform simple, naive baselines." | ||
| --- | ||
| # Baseline Comparison Review | ||
| Machine learning models add massive technical debt. You must constantly justify their existence by comparing them to a "dumb" baseline. | ||
| ## Review Protocol | ||
| 1. **Define the Naive Baseline**: | ||
| - For classification: Predict the majority class. | ||
| - For regression: Predict the mean or median of the training target. | ||
| - For time series: Predict the last known value (naive persistence). | ||
| 2. **Define the Heuristic Baseline**: What simple `if/else` rule would a domain expert write? | ||
| 3. **Evaluate the Delta**: If the complex Deep Learning model only beats the heuristic baseline by 1%, recommend keeping the heuristic. The complexity is not worth the maintenance cost. | ||
| 4. **Action**: Always demand a baseline evaluation script before approving a new model architecture. |
| --- | ||
| name: canonical-advisor-routing | ||
| description: Process-first advisor routing with artifact capture | ||
| --- | ||
| # Canonical Advisor Routing | ||
| Route a prompt through a local provider CLI and persist the result as an artifact. | ||
| ## Usage | ||
| Use the provided command wrappers: | ||
| ```bash | ||
| apx ask-codex "review this patch from a security perspective" | ||
| apx ask-gemini "suggest UX improvements for this flow" | ||
| apx ask-claude "draft an implementation plan for issue #123" | ||
| ``` | ||
| ## Routing | ||
| **Required execution path:** | ||
| Invoke the provider CLI via the canonical `apx ask-*` wrappers. Do not manually assemble raw provider CLI commands unless debugging the wrapper. | ||
| ## Requirements | ||
| - The selected local CLI must be installed and authenticated. | ||
| ## Artifacts | ||
| Write the response to the standard artifact location: | ||
| ```text | ||
| .agent-powerups/artifacts/ask/<provider>-<slug>-<timestamp>.md | ||
| ``` |
| --- | ||
| name: change-impact-check | ||
| description: "Use before submitting a PR or considering a task done to evaluate the 'blast radius' of your changes." | ||
| --- | ||
| # Change Impact Check | ||
| Code changes rarely exist in isolation. Before declaring success, you must evaluate the downstream consequences of your work. | ||
| ## Impact Assessment Protocol | ||
| 1. **API Surface**: Did you change a public method signature, REST endpoint, or database schema? If so, immediately `grep` the entire repository for usages of the old signature. | ||
| 2. **Dependency Graph**: If you updated a core utility function (e.g., `formatDate`), find all modules that import it. Do their tests still pass? | ||
| 3. **Configuration**: Did you add a new environment variable? Ensure it is documented in `.env.example` or the README. | ||
| 4. **Action**: If you detect a high blast radius, run the full test suite (not just the local unit tests) and explicitly document the affected areas in your handoff or PR description. |
| --- | ||
| name: ci-failure-readout | ||
| description: "Use when a CI pipeline fails to extract the actual error from thousands of lines of logs." | ||
| --- | ||
| # CI Failure Readout | ||
| CI logs are notoriously noisy. Do not dump the entire log into the context window. | ||
| ## Readout Protocol | ||
| 1. **Locate the True Error**: Search the CI log (using the UI or by downloading and `grep`ing it) for the exact step that failed. Ignore setup/teardown noise. | ||
| 2. **Extract the Trace**: Copy only the stack trace or the specific compiler/linter error message. | ||
| 3. **Reproduce Locally**: The first rule of fixing a CI failure is proving it fails locally. Run the exact command the CI runner used (e.g., `npm run test:e2e`). | ||
| 4. **Draft the Readout**: Before fixing it, write a 2-sentence summary: "CI failed during the `build` step because `src/types.ts` is missing an export." This forces you to understand the problem instead of blindly guessing. |
| --- | ||
| name: context-docs | ||
| description: "Maintain short, focused Markdown files per subsystem to provide agents with isolated context." | ||
| --- | ||
| # Context Docs | ||
| Large centralized documentation files consume too much context window. Decentralized, module-specific context docs provide targeted information exactly when an agent needs it. | ||
| ## Context Protocol | ||
| 1. Place README or CONTEXT docs *inside* specific subsystem directories (e.g., `src/auth/CONTEXT.md`). | ||
| 2. Document only the boundaries: How does this module communicate with the rest of the app? What are its critical invariants? | ||
| 3. Keep it terse. Use bullet points and exact file paths. | ||
| 4. Update these files inline when refactoring the module. |
| --- | ||
| name: context-minimization | ||
| description: "Use continuously during long tasks. Teaches how to read less, output less, and keep the LLM context window lean and fast." | ||
| --- | ||
| # Context Minimization | ||
| Your context window is the most precious resource. Large contexts make you slow, expensive, and prone to hallucinations. | ||
| ## The Rules of Lean Context | ||
| 1. **Surgical Reads**: Never use `cat` or `read_file` on a 2,000-line file without `start_line` and `end_line`. Always use `grep` first to find the relevant line numbers. | ||
| 2. **Silent Commands**: Always append `--silent`, `-q`, or redirect stderr/stdout to `/dev/null` for commands that produce massive logs (like `npm install` or verbose builds) unless you specifically need to debug them. | ||
| 3. **Pagination**: Disable pagers for all terminal tools. E.g., `git --no-pager log`. | ||
| 4. **Terse Responses**: Do not explain what a tool does before calling it, unless safety requires it. Do not repeat the user's instructions back to them verbatim. | ||
| 5. **Close Files**: Once you are done looking at a file, stop referring to it. | ||
| 6. **Parallel Ops**: If you need to search 3 files, run 3 parallel grep/read calls in a single turn instead of sequentially. This saves turns, which saves context repetition. |
| --- | ||
| name: dataset-split-review | ||
| description: "Audit the methodology used to split data into train, validation, and test sets." | ||
| --- | ||
| # Dataset Split Review | ||
| A random split is often the wrong split. Incorrect splitting causes massive overestimation of model performance. | ||
| ## Review Protocol | ||
| 1. **Time-Series Data**: If the data has a time component, `train_test_split` is strictly forbidden. You must use a chronological split to prevent the model from learning the future. | ||
| 2. **Group Leakage**: If the dataset has multiple rows for a single user/patient/session, a standard split will put rows from the same user in both train and test. You must use GroupKFold or group-based splitting. | ||
| 3. **Stratification**: For imbalanced datasets, verify that stratification is used to maintain the target distribution across all splits. | ||
| 4. **Action**: Review the splitting code and explicitly verify Time, Group, and Stratification safety. |
| --- | ||
| name: dead-code-removal | ||
| description: "Use to identify and safely delete unused functions, classes, exports, and files." | ||
| --- | ||
| # Dead Code Removal | ||
| Dead code increases maintenance overhead and confuses developers. | ||
| ## The Removal Protocol | ||
| 1. **Verify Unused**: Before deleting anything, you must search the *entire repository* to ensure the symbol or file is truly unused. Do not assume it is dead just because the current file doesn't use it. | ||
| 2. **Check for Dynamic Invocation**: Be wary of dynamically invoked code (e.g., reflection, dependency injection by string name, ORM mappers). If there is any doubt, leave it alone or ask the user. | ||
| 3. **Delete Aggressively**: Once confirmed unused, delete the code. Do not comment it out. | ||
| 4. **Prune Dependencies**: If you delete the only code that was using an imported module, remove the import statement as well. | ||
| 5. **Run Tests**: Always run tests and/or type checkers (e.g., `tsc --noEmit`) after removal to ensure you didn't accidentally break a hidden dependency. |
| --- | ||
| name: dependency-cleanup | ||
| description: "Use to audit and remove unused or redundant third-party dependencies from package manifests." | ||
| --- | ||
| # Dependency Cleanup | ||
| Bloated dependencies slow down builds, increase security surface area, and complicate updates. | ||
| ## Cleanup Protocol | ||
| 1. **Audit**: Review package manifests such as package JSON, requirements text, or Cargo manifests. | ||
| 2. **Verify Usage**: For any suspect dependency, perform a global search across the codebase (e.g., `import .* from 'lodash'`). | ||
| 3. **Remove**: If there are zero usages, use the native package manager command to remove it (e.g., `npm uninstall lodash` or `pip uninstall ...`). Do not just manually edit the manifest unless absolutely necessary, to ensure lockfiles are updated correctly. | ||
| 4. **Consolidate**: If multiple libraries serve the exact same purpose (e.g., `moment` and `date-fns`), flag it to the user for future consolidation. Do not attempt a massive library migration autonomously. | ||
| 5. **Validate**: Run the build and test suite to ensure the removed dependency wasn't implicitly required by a build script or runtime environment. |
| --- | ||
| name: doc-consistency-check | ||
| description: "Audit documentation for broken file paths, outdated commands, and renamed variables." | ||
| --- | ||
| # Doc Consistency Check | ||
| Documentation rots when code changes. This skill identifies stale references in Markdown files. | ||
| ## Consistency Protocol | ||
| 1. Grep markdown files (`.md`) for file paths (e.g., `src/components/Button.tsx`). | ||
| 2. Verify that those files still exist in the repository. If not, the documentation is stale. | ||
| 3. Check code blocks in documentation. Do the function names and variable names still match the actual source code? | ||
| 4. Flag broken links and outdated references for immediate correction. |
| --- | ||
| name: experiment-tracking-review | ||
| description: "Verify that all hyperparameters, metrics, and data references are properly logged." | ||
| --- | ||
| # Experiment Tracking Review | ||
| An ML experiment is useless if you cannot reconstruct exactly how it was run and what data it used. | ||
| ## Review Protocol | ||
| 1. **Hyperparameter Logging**: Ensure the script logs *every* hyperparameter (learning rate, batch size, architecture details). Hardcoded magic numbers in the script must be extracted to a config and logged. | ||
| 2. **Metric Logging**: Verify that training and validation metrics are logged at each epoch or step, not just at the end. | ||
| 3. **Artifact Saving**: Ensure the final model weights, preprocessing scalers/encoders, and the exact configuration file are saved together in a versioned directory or tracking system. | ||
| 4. **Action**: Do not allow training scripts to print metrics to stdout only. Enforce structured logging (JSON, MLflow, wandb). |
| --- | ||
| name: failure-triage | ||
| description: "Use when confronted with an unknown failure in CI or production to rapidly categorize the issue before deep debugging." | ||
| --- | ||
| # Failure Triage | ||
| Before diving deep into a stack trace or spending hours reproducing a bug, you must triage it to determine the blast radius, subsystem, and debugging approach. | ||
| ## The Triage Process | ||
| 1. **Categorize the Failure**: | ||
| - Is it a **Syntax/Build Error**? (Fails before running) | ||
| - Is it a **Logic Error**? (Runs, but produces wrong output) | ||
| - Is it an **Infrastructure/Environment Error**? (Network timeout, missing DB table) | ||
| - Is it a **Flaky/Non-deterministic Error**? (Fails sometimes) | ||
| 2. **Locate the Origin**: | ||
| - Scan the stack trace. Ignore framework/library internals. Find the highest frame that belongs to the *first-party application code*. | ||
| 3. **Check Recent Changes**: | ||
| - Run `git log -n 5 --oneline` and `git diff` to see what changed recently. Most bugs are in the newest code. | ||
| 4. **Formulate a Hypothesis**: | ||
| - State clearly: "I suspect this is an environment error caused by missing configuration, originating in `src/config.ts`." | ||
| Do not start writing fixes until you have explicitly stated your triage hypothesis and confirmed the category. |
| --- | ||
| name: flaky-test-investigation | ||
| description: "Use to diagnose tests that pass and fail intermittently without code changes." | ||
| --- | ||
| # Flaky Test Investigation | ||
| Flaky tests erode trust in CI. Do not just re-run them and hope for the best. | ||
| ## Investigation Protocol | ||
| 1. **Isolate the Test**: Run the specific failing test by itself. If it passes, the flake is likely an **order dependency** or **state leakage** from a previous test. | ||
| 2. **Stress Test**: Run the test in a tight loop (e.g., `for i in {1..100}; do npm test -- -t "My Test"; done`). | ||
| 3. **Check for Common Vectors**: | ||
| - **Time**: Does the test rely on `Date.now()` or `setTimeout`? Mock the clock. | ||
| - **Async/Promises**: Are we asserting before a background task finishes? Ensure proper `await` or `waitFor` usage. | ||
| - **Shared State**: Are we reusing database records, global singletons, or mutated variables between runs? Ensure clean teardowns in `afterEach`. | ||
| - **Randomness**: Does the test rely on random IDs or sorts? Force deterministic seeds or sort orders. | ||
| 4. **Prove the Fix**: Do not just guess. The fix must be verified by running the stress test loop again and achieving a 100% pass rate. |
| --- | ||
| name: handoff-discipline | ||
| description: "Use when completing a task or running out of context limit. Ensures the next session or human engineer has exactly what they need to resume work instantly." | ||
| --- | ||
| # Handoff Discipline | ||
| When ending a session, handing a task back to the user, or preparing to swap to a new context window, you must leave a clean paper trail. | ||
| ## The Handoff Rules | ||
| 1. **State the End Condition**: Explain exactly why you are stopping (e.g., "Task complete", "Blocked on PR", "Context window too large"). | ||
| 2. **Leave a Breadcrumb**: If the task is incomplete, summarize the last successful step, the current failing step, and the *exact next command* to run. | ||
| 3. **Commit or Stash**: Ensure the working directory is clean. Either commit the work, tell the user to commit, or stash it. Do not leave unverified messy state. | ||
| 4. **Link the Work**: Provide file paths to the modified files or generated artifacts so the next agent/user doesn't have to search for them. | ||
| ## The Handoff Summary Format | ||
| When creating a handoff summary file, use this exact structure: | ||
| ```markdown | ||
| ### 1. Goal | ||
| [1-2 sentences on what we were trying to do] | ||
| ### 2. State | ||
| - ✅ Completed: [What works] | ||
| - 🚧 In Progress: [What is broken or partial] | ||
| - 🛑 Blockers: [What stopped us] | ||
| ### 3. Next Steps | ||
| 1. Run `npm test ...` | ||
| 2. Fix the error in `src/foo.ts` around line X. | ||
| ``` |
| --- | ||
| name: handoff-documentation | ||
| description: "Write state-restoration documents for passing tasks between agents or engineers." | ||
| --- | ||
| # Handoff Documentation | ||
| When a session ends, the context window is destroyed. Handoff docs serialize the necessary state to allow immediate resumption without re-reading the entire codebase. | ||
| ## Handoff Protocol | ||
| 1. Write a handoff document before concluding the task. | ||
| 2. **Current State**: What exactly is broken or unfinished? (e.g., "Test X in foo.spec.ts is failing with Error Y"). | ||
| 3. **Next Action**: Provide the exact terminal command the next agent/human should run to see the failure. | ||
| 4. **Discovered Constraints**: Note any dead ends encountered so the next session doesn't repeat the mistake (e.g., "Tried using Library Z, but it doesn't support async"). |
| --- | ||
| name: incident-readout | ||
| description: "Use after fixing a bug to generate a blameless post-mortem summary for human review." | ||
| --- | ||
| # Incident Readout | ||
| When a complex debugging session ends, you must produce an incident readout. This prevents knowledge loss and helps humans review the fix quickly. | ||
| ## Format | ||
| Output an incident readout document (or print to terminal) using this structure: | ||
| ### 1. The Symptom | ||
| What was reported? (1-2 sentences) | ||
| ### 2. The Root Cause | ||
| What was the actual underlying technical reason for the failure? Be highly specific about the exact line of code, assumption, or state that failed. | ||
| ### 3. The Fix | ||
| What did we change to fix it? Provide a high-level summary of the structural change, not just a diff. | ||
| ### 4. Prevention | ||
| How do we ensure this never happens again? (e.g., "Added test case X", "Refactored module Y to be strongly typed"). |
| --- | ||
| name: incremental-migration | ||
| description: "Use when migrating APIs, libraries, or patterns across a large codebase. Ensures safe, step-by-step progress rather than risky mega-commits." | ||
| --- | ||
| # Incremental Migration | ||
| Never attempt to migrate an entire codebase in a single step. Mega-commits are impossible to review and dangerous to merge. | ||
| ## The Incremental Strategy | ||
| 1. **Define the Target Pattern**: Clearly establish the "Old Way" and the "New Way". | ||
| 2. **Implement Side-by-Side**: Create the "New Way" implementation alongside the old one. Do not delete the old one yet. | ||
| 3. **Migrate One Vertical Slice**: Pick exactly one feature, route, or component. Update it to use the new pattern. | ||
| 4. **Test and Commit**: Verify the slice works. Commit this step. | ||
| 5. **Repeat**: Move to the next slice. | ||
| 6. **Deprecate and Remove**: Only once all usages of the "Old Way" are gone can you safely delete the old implementation. | ||
| If a migration is too large for a single session, leave a clear handoff document summarizing progress and the next files to migrate. |
| --- | ||
| name: log-driven-diagnosis | ||
| description: "Use when debugging complex runtime failures, distributed systems, or issues where a local debugger cannot be attached." | ||
| --- | ||
| # Log-Driven Diagnosis | ||
| When you cannot step through code, logs are your only visibility. You must be methodical in how you extract signals from noise. | ||
| ## Protocol | ||
| 1. **Time-Bound Search**: Never dump the whole log file. Always `grep` for timestamps around the reported incident, or use tail. | ||
| 2. **Identify the Request ID**: If the system uses distributed tracing or request IDs, find the ID associated with the error, then `grep` the entire log corpus for *only* that ID to trace the complete lifecycle of the failed request. | ||
| 3. **Look for Preceding Warnings**: The `ERROR` log is usually just the final crash. The actual root cause is often a `WARNING` or unexpected `INFO` log that occurred milliseconds earlier (e.g., a connection retry failing, or an empty array being returned). | ||
| 4. **Add Missing Logs**: If the logs do not provide enough visibility, your first action must be to *add temporary logging* to the application, reproduce the bug, and gather the new signals. Do not guess blindly if the logs are insufficient. |
| --- | ||
| name: memory-build-workflow | ||
| description: Use when a user needs to build or refresh persistent graph memory from a mixed corpus and the right path may include graphify, incremental update, or helper conversion before ingestion. | ||
| --- | ||
| # Memory Build Workflow | ||
| ## Overview | ||
| Build persistent graph memory with `graphify`. | ||
| Use helper tools only when source format would otherwise reduce graph quality or waste context. | ||
| ## When to Use | ||
| - first graph build for a repo, notes folder, research corpus, or mixed raw folder | ||
| - corpus changed enough that persistent graph memory is worth refreshing | ||
| - input includes PDFs, Office docs, or noisy web pages that should be normalized before graph build | ||
| - user wants durable graph outputs instead of one-shot file reading | ||
| Do not use for: | ||
| - one small plain-text file or a narrow one-off question | ||
| - cases where an existing graph already answers the question better via query | ||
| ## Required Checks | ||
| ```powershell | ||
| apx check graphify | ||
| apx check markitdown-file-intake | ||
| apx check defuddle | ||
| ``` | ||
| Stop and report missing tools. Do not auto-install without approval. | ||
| ## Routing | ||
| | Situation | Action | | ||
| |---|---| | ||
| | ready local corpus of readable files | run `graphify` | | ||
| | existing graph plus changed sources | run `graphify --update` | | ||
| | PDF, Office doc, slide deck, or similar hard-to-read format | convert with `markitdown-file-intake`, then build with `graphify` | | ||
| | article or noisy web page | clean with `defuddle`, then build with `graphify` | | ||
| | user wants vault browsing after build | offer optional Obsidian export | | ||
| ## Core Rules | ||
| - `graphify` is the primary engine | ||
| - prefer `graphify --update` over full rebuild when a graph already exists | ||
| - use helpers only to improve source readability before graph ingestion | ||
| - keep Obsidian optional and post-build | ||
| - keep source provenance intact when converting inputs | ||
| ## Minimal Workflow | ||
| 1. Check whether a usable graph already exists. | ||
| 2. If it exists and sources changed, prefer `graphify --update`. | ||
| 3. If sources are noisy or binary, normalize them with the narrowest helper. | ||
| 4. Build or refresh with `graphify`. | ||
| 5. Offer query workflow next instead of rereading the corpus. | ||
| ## Common Failure Modes | ||
| - missing `graphify`: stop and report; no fallback build path | ||
| - rebuilding from scratch when update would work: unnecessary cost and churn | ||
| - using helpers on already-readable Markdown or code: wasted step | ||
| - treating Obsidian as required: wrong; it is optional output only | ||
| ## References | ||
| - [`../graphify/UPSTREAM.md`](../graphify/UPSTREAM.md) | ||
| - [`../../references/HELPER_TOOLS.md`](../../references/HELPER_TOOLS.md) | ||
| - [`../../references/OBSIDIAN_EXPORT.md`](../../references/OBSIDIAN_EXPORT.md) |
| --- | ||
| name: memory-optimization-workflow | ||
| description: Use when deciding the lowest-cost context path for a mixed corpus, especially when choosing among direct reading, helper conversion, graph build, graph update, or graph query. | ||
| --- | ||
| # Memory Optimization Workflow | ||
| ## Overview | ||
| Minimize token spend, reread cost, and unnecessary rebuilds. | ||
| `graphify` is the main optimization path for repeated work. Helper tools exist to make hard sources cheaper before graph or direct reading. | ||
| ## When to Use | ||
| - mixed corpus and the cheapest inspection path is unclear | ||
| - repeated questions over the same files | ||
| - need to choose between direct read, conversion, graph build, update, or query | ||
| - want to reduce repeated large-context rereads | ||
| Do not use for: | ||
| - tiny single-file questions where direct reading is already cheapest | ||
| - cases where the user explicitly wants raw-file inspection only | ||
| ## Required Checks | ||
| ```powershell | ||
| apx check graphify | ||
| apx check markitdown-file-intake | ||
| apx check defuddle | ||
| ``` | ||
| Stop and report missing tools. Do not auto-install without approval. | ||
| ## Fast Routing | ||
| | Situation | Cheapest path | | ||
| |---|---| | ||
| | small readable text corpus, one question | read directly | | ||
| | PDF, Office doc, or other binary-like source | `markitdown-file-intake` | | ||
| | noisy web page or article | `defuddle` | | ||
| | repeated questions across same corpus | build with `graphify` | | ||
| | existing graph plus changed sources | `graphify --update` | | ||
| | existing graph plus new question | query graph first | | ||
| ## Decision Rules | ||
| - prefer direct reading for small plain-text scope | ||
| - prefer Markdown over binary or chrome-heavy formats | ||
| - prefer graph query over full reread when a graph already exists | ||
| - prefer incremental update over rebuild | ||
| - keep helper tools secondary to the main graph path | ||
| - keep Obsidian optional; it is not part of the optimization decision unless the user wants vault browsing | ||
| ## Escalation Ladder | ||
| 1. Direct read if scope is already small and readable. | ||
| 2. Convert only if format is the main source of waste. | ||
| 3. Build graph memory when questions will repeat or corpus is broad. | ||
| 4. Update existing graph when sources changed. | ||
| 5. Query existing graph before any broad reread. | ||
| ## Common Failure Modes | ||
| - building a graph for a tiny one-shot question | ||
| - rereading large corpora after a graph already exists | ||
| - converting already-readable Markdown or code | ||
| - rebuilding instead of updating | ||
| - making helper tools feel primary instead of supportive | ||
| ## References | ||
| - [`../../references/HELPER_TOOLS.md`](../../references/HELPER_TOOLS.md) | ||
| - [`../../references/GRAPHIFY_PROVENANCE.md`](../../references/GRAPHIFY_PROVENANCE.md) |
| --- | ||
| name: memory-query-workflow | ||
| description: Use when a graph already exists and the user needs retrieval, tracing, explanation, or gap detection from graph memory before reopening the full corpus. | ||
| --- | ||
| # Memory Query Workflow | ||
| ## Overview | ||
| Use existing graph memory first. | ||
| Query the graph before rereading source files unless the graph is missing, stale, or too weak for the question. | ||
| ## Required Check | ||
| ```powershell | ||
| apx check graphify | ||
| ``` | ||
| ## Required State | ||
| - existing `graphify-out/graph.json` | ||
| ## Routing | ||
| | Question shape | Action | | ||
| |---|---| | ||
| | broad question about connected concepts | `graphify query` | | ||
| | trace between two concepts, files, or systems | `graphify path` | | ||
| | explain one concept or node in context | `graphify explain` | | ||
| | no graph exists | switch to `memory-build-workflow` | | ||
| | graph exists but corpus changed | recommend `graphify --update` before trusting results | | ||
| ## Core Rules | ||
| - prefer graph retrieval over full-corpus reread | ||
| - say explicitly when the graph is missing, stale, sparse, or weakly matched | ||
| - do not overclaim beyond what graph nodes and edges support | ||
| - when graph coverage is weak, use the graph result to target the next direct read instead of restarting broad exploration | ||
| ## Minimal Workflow | ||
| 1. Confirm `graphify-out/graph.json` exists. | ||
| 2. Choose `query`, `path`, or `explain` based on question shape. | ||
| 3. Answer from graph evidence first. | ||
| 4. If result quality is weak, say why: missing graph, stale graph, low coverage, or weak node match. | ||
| 5. Escalate to build/update or targeted reread only when needed. | ||
| ## Common Failure Modes | ||
| - skipping graph lookup and rereading everything | ||
| - hiding that the graph is stale or incomplete | ||
| - using `query` for a question that clearly needs a path trace | ||
| - treating no-result output as proof the corpus lacks the concept | ||
| ## Reference | ||
| - [`../graphify/UPSTREAM.md`](../graphify/UPSTREAM.md) |
| --- | ||
| name: minimal-reproduction | ||
| description: "Use to isolate a bug from a large application into a standalone, runnable script or single test case." | ||
| --- | ||
| # Minimal Reproduction | ||
| You cannot reliably fix what you cannot reliably reproduce in isolation. | ||
| ## The Subtraction Method | ||
| 1. **Start with the Failure**: Take the code path that fails. | ||
| 2. **Remove the UI/Network**: If the bug is reported via a web request, write a script that calls the internal controller directly. | ||
| 3. **Mock Dependencies**: If the bug doesn't require the database, mock it. If it doesn't require the third-party API, mock it. | ||
| 4. **Prune Data**: If the bug fails on a 10MB JSON payload, binary search the payload down to the exact 2 keys that trigger the failure. | ||
| 5. **Final Output**: The result must be a single file that relies on ZERO external state, can be run with a single command, and deterministically outputs the exact error reported. |
| --- | ||
| name: ml-leakage-check | ||
| description: "Identify and prevent target leakage in ML preprocessing pipelines." | ||
| --- | ||
| # ML Leakage Check | ||
| Target leakage is the most common and dangerous error in applied ML. It creates models that look perfect in validation but fail instantly in production. | ||
| ## Leakage Vectors to Check | ||
| 1. **Global Scaling/Imputation**: Did the author calculate the mean of the *entire* dataset to impute missing values before splitting? This leaks the test set distribution into the training set. | ||
| 2. **Future Features**: Is there a feature available in the training data that would absolutely not be available at the moment of prediction in real life? (e.g., using "surgery_outcome" to predict "hospital_admission_length"). | ||
| 3. **ID Proxies**: Are database IDs or row numbers accidentally included as features? They often correlate with time or order of entry. | ||
| 4. **Action**: Enforce the rule: Split FIRST, then fit transformers on Train ONLY, then transform Train/Val/Test. |
| --- | ||
| name: model-evaluation-reporting | ||
| description: "Standardize the reporting of model metrics to ensure statistical rigor and business relevance." | ||
| --- | ||
| # Model Evaluation Reporting | ||
| Raw accuracy metrics are not enough. Evaluation must reflect the actual business impact and failure modes of the model. | ||
| ## Reporting Standards | ||
| 1. **Beyond Accuracy**: Demand the Confusion Matrix. Demand Precision, Recall, and F1. Explain the cost of a False Positive vs. a False Negative in the business context. | ||
| 2. **Slice Analysis**: Report performance on key segments. A model might be 95% accurate overall, but only 40% accurate on new users. | ||
| 3. **Calibration**: If the model outputs probabilities, verify if they are calibrated. A prediction of 0.8 should mean it happens 80% of the time. | ||
| 4. **Action**: Format the output as a Markdown report that a non-technical stakeholder can read, highlighting trade-offs and worst-case scenarios. |
| --- | ||
| name: naming-and-structure-cleanup | ||
| description: "Use to enforce consistent naming conventions and file structures across a project without changing business logic." | ||
| --- | ||
| # Naming and Structure Cleanup | ||
| Inconsistent naming (camelCase vs snake_case) and messy file structures make codebases hard to navigate. | ||
| ## Cleanup Rules | ||
| 1. **Observe Local Conventions**: Before renaming, scan the project to determine the dominant convention. If 80% of files use `camelCase`, enforce `camelCase`. | ||
| 2. **Targeted Renames**: Use the `safe-rename` command pattern to update variables, classes, or files. Ensure all imports are updated. | ||
| 3. **File Co-location**: Move files so that closely related logic is co-located (e.g., keeping `Button.tsx`, `Button.css`, and `Button.test.tsx` in the same directory). | ||
| 4. **No Logic Changes**: Do not refactor the internal logic of functions while performing naming cleanups. Keep the diff focused purely on structure and names. | ||
| 5. **Verify**: Run the project's type checker and test suite after every structural change. |
| --- | ||
| name: pre-release-verification | ||
| description: "Use before tagging a release or deploying to production to ensure all quality gates have passed." | ||
| --- | ||
| # Pre-Release Verification | ||
| Releases must be deterministic and verified. No "hope driven" deployments. | ||
| ## Verification Checklist | ||
| Before authorizing or participating in a release process, verify the following: | ||
| 1. **Clean Working Tree**: `git status` must be completely clean. No untracked files or uncommitted changes. | ||
| 2. **Green CI**: The latest commit on the main branch MUST have a passing CI pipeline. | ||
| 3. **Lint & Types**: Run the project's linter (`npm run lint`, `cargo clippy`, etc.) and type checker (`tsc --noEmit`). They must exit with 0. | ||
| 4. **Test Gate**: Run the full test suite locally if CI is not available or if requested. | ||
| 5. **No Secrets**: Ensure no API keys or credentials have been accidentally hardcoded or staged. | ||
| If any check fails, the release is blocked. State the exact failure and stop. |
| --- | ||
| name: readme-hardening | ||
| description: "Ensure the project README provides immediate, exact commands for setup, testing, and deployment to help agents and humans bootstrap quickly." | ||
| --- | ||
| # README Hardening | ||
| A good README is an executable contract, not a marketing page. It must allow an agent or a new engineer to clone the repository and run tests within 3 minutes. | ||
| ## Hardening Protocol | ||
| 1. **Verify Commands**: Extract every shell command (`npm install`, `docker-compose up`, `cargo test`) from the README and run them in a clean environment. If they fail, fix the README. | ||
| 2. **Remove Ambiguity**: Replace "Install dependencies" with `npm ci`. Replace "Run the app" with `npm run start:dev`. | ||
| 3. **Environment Checklist**: Clearly list required environment variables in a `.env.example` block. Do not just say "set up your environment." | ||
| 4. **Architecture Pointers**: Provide exact file paths for entry points (e.g., "Main API routing is in `src/routes.ts`") to save agents from searching the entire tree. |
| --- | ||
| name: regression-bisecting | ||
| description: "Use when a bug was recently introduced but you don't know which commit caused it." | ||
| --- | ||
| # Regression Bisecting | ||
| When a feature used to work but is now broken, do not guess what broke it. Use binary search through git history to find the exact commit. | ||
| ## Protocol | ||
| 1. **Define the Test**: You must have a single command that returns exit code `0` if good, and non-zero if bad. (e.g., `npm run test:repro` or `node repro.js`). | ||
| 2. **Find a Known Good State**: Ask the user or search git history for a commit where you are certain the feature worked. | ||
| 3. **Find the Known Bad State**: Typically `HEAD`. | ||
| 4. **Bisect**: | ||
| - (For human workflows, guide them to use `git bisect start <bad> <good>`). | ||
| - For agent workflows, manually check out the midpoint commit, run the test, and narrow the window. | ||
| 5. **Analyze the Offending Commit**: Once the exact commit is found, use `git show <commit>` to analyze the diff. The root cause is contained entirely within that diff. |
| --- | ||
| name: reproducible-training-runs | ||
| description: Analyzes ML training scripts to enforce seed setting, deterministic operations, and environment tracking for exact reproducibility. | ||
| --- | ||
| # Reproducible Training Runs | ||
| Use this skill when reviewing or modifying ML training scripts to ensure they produce deterministic, reproducible results across runs. | ||
| ## Prerequisites | ||
| - A target Python training script. | ||
| ## Instructions | ||
| When applying this skill, check for and enforce the following reproducibility standards: | ||
| 1. **Global Seed Initialization:** Ensure a single function sets seeds for all relevant libraries (`random`, `numpy`, `torch`, `tensorflow`). | ||
| 2. **Deterministic Algorithms:** For PyTorch or TensorFlow, check if deterministic algorithms are enabled (e.g., `torch.use_deterministic_algorithms(True)`). | ||
| 3. **Data Loading:** Verify that data loaders use deterministic shuffling and that worker processes are seeded correctly to avoid identical augmentations. | ||
| 4. **Environment & Config Tracking:** Ensure that the script logs the exact configuration, dependency versions, and data hashes. | ||
| ## Safety & Style | ||
| - **Review First:** Point out missing reproducibility guards before rewriting the script. | ||
| - **Keep it Explicit:** Provide the exact snippet for seed initialization. Do not hide side effects. | ||
| - **Performance Trade-offs:** Warn the user if enabling deterministic algorithms will significantly impact training speed. |
| --- | ||
| name: risk-based-review | ||
| description: "Use when reviewing code (or your own plan) to allocate attention based on the danger of the change." | ||
| --- | ||
| # Risk-Based Review | ||
| Not all code changes deserve the same level of scrutiny. A typo fix in a README is low risk; a change to the authentication middleware is critical. | ||
| ## Risk Categories | ||
| 1. **Critical Risk** (Auth, Payments, Cryptography, Database Migrations): | ||
| - Require 100% test coverage for the change. | ||
| - Require explicit human sign-off. | ||
| - Look for edge cases, null pointers, and race conditions. | ||
| 2. **High Risk** (Core Business Logic, Shared Utilities, Public API changes): | ||
| - Require unit and integration tests. | ||
| - Check for backwards compatibility and blast radius (see `change-impact-check`). | ||
| 3. **Low Risk** (UI tweaks, isolated components, internal tools): | ||
| - Focus on readability, naming conventions, and simple unit tests. | ||
| When acting as a reviewer, explicitly state the Risk Category of the PR before providing feedback. |
| --- | ||
| name: semantic-layer-change-review | ||
| description: "Use when modifying dbt metrics or semantic models to ensure mathematical correctness and backwards compatibility." | ||
| --- | ||
| # Semantic Layer Change Review | ||
| Changes to the semantic layer directly impact dashboards and business reporting. A silent drift in a metric definition destroys trust. | ||
| ## Review Protocol | ||
| 1. **Identify the Change Type**: | ||
| - **Addition**: Safe. (Adding a new metric or dimension). | ||
| - **Deprecation**: Requires communication. (Removing a metric). | ||
| - **Modification**: High Risk. (Changing the SQL expression, aggregation, or filters of an existing metric). | ||
| 2. **Evaluate Mathematical Soundness**: | ||
| - Are we averaging an average? | ||
| - Are we summing a distinct count? | ||
| - Does adding this dimension cause a fan-out that inflates the metric? | ||
| 3. **Check Backwards Compatibility**: | ||
| - If an existing metric's logic is changed, you MUST flag it. The recommended path is to use dbt's metric versioning or create a new metric (e.g., `revenue_v2`) rather than silently altering historical numbers. | ||
| 4. **Verify Entity Mapping**: | ||
| - Ensure `entities` (primary/foreign keys) match the granularity of the underlying semantic model. | ||
| ## Anti-Pattern | ||
| Do not approve a pull request that changes the `expr` of a core metric without explicitly confirming the business requested the restatement of historical data. |
| --- | ||
| name: strategic-context-compaction | ||
| description: Compact context at logical phase boundaries — after research, after planning, after debugging — rather than mid-task. Preserves useful state while clearing noise. | ||
| --- | ||
| # Strategic Context Compaction | ||
| Compact at logical boundaries to preserve high-value context while clearing noise. Arbitrary or mid-task compaction loses critical state. | ||
| ## When to Compact | ||
| | Transition | Compact? | Reason | | ||
| |-----------|----------|--------| | ||
| | Research → Planning | **Yes** | Research context is bulky; the plan is the distilled output | | ||
| | Planning → Implementation | **Yes** | Plan is saved in tasks/files; context is free to reset | | ||
| | Implementation → Testing | **Maybe** | Keep if tests reference recent code; compact if switching focus area | | ||
| | Debugging → Next feature | **Yes** | Debug traces pollute unrelated work | | ||
| | Mid-implementation | **No** | Losing file paths, variable names, partial state is costly | | ||
| | After a failed approach | **Yes** | Clear dead-end reasoning before trying a new approach | | ||
| ## Before Compacting | ||
| Save anything you cannot reconstruct cheaply: | ||
| - Write the plan to a task list or file before compacting after research | ||
| - Commit or stash work-in-progress code before compacting after debugging | ||
| - Note key file paths in the next prompt if they will be needed again | ||
| ## What Survives Compaction | ||
| | Survives | Lost | | ||
| |----------|------| | ||
| | CLAUDE.md / AGENTS.md instructions | Intermediate reasoning | | ||
| | Task list (TodoWrite) | File contents read in session | | ||
| | Files on disk | Tool call history | | ||
| | Git state | Verbally stated preferences | | ||
| | Memory files | Multi-step conversation context | | ||
| ## Compaction Discipline | ||
| - Do not compact to "clean up" during active multi-file implementation | ||
| - Do compact when starting a conceptually distinct task in the same session | ||
| - Use a summary prompt with `/compact`: `/compact — now implementing auth middleware per plan` | ||
| - After compaction, re-read the task list or plan file to restore intent | ||
| ## Token Awareness | ||
| - Each loaded skill adds 1–5K tokens to context | ||
| - Load skills on demand, not at session start | ||
| - CLAUDE.md / AGENTS.md are always loaded; keep them lean | ||
| - Duplicate instructions (root config + plugin skill) are the most common waste |
| --- | ||
| name: task-intake | ||
| description: "Use at the beginning of a new task. Ensures you fully understand the requirements, boundaries, and acceptance criteria before writing code." | ||
| --- | ||
| # Task Intake Protocol | ||
| Never start implementing blindly. When you receive a new task, you must force clarification of boundaries and expected outcomes. | ||
| ## Intake Checklist | ||
| 1. **What is the goal?** Summarize the user's request in your own words. | ||
| 2. **What is out of scope?** Identify what you are *not* going to do. If the user asked to fix a button, do not refactor the routing layer. | ||
| 3. **How will we test it?** Define the validation criteria. Will it be a unit test, a manual UI check, or a curl command? | ||
| 4. **What context is missing?** Ask the user for specific files, logs, or environment details if the request is too vague. | ||
| ## Anti-Pattern: The Blind Start | ||
| Do not say "I will now fix the bug." and immediately edit files. Instead, use a repo-map or grep to confirm the files exist, then state your understanding of the problem. If the user's instruction is ambiguous, explicitly pause and ask them a clarifying question. |
| --- | ||
| name: test-preserving-refactor | ||
| description: "Use to restructure code while guaranteeing that all existing tests continue to pass." | ||
| --- | ||
| # Test-Preserving Refactor | ||
| Refactoring is only safe if it is backed by tests. | ||
| ## The Protocol | ||
| 1. **Run Tests First**: Before touching any code, run the tests covering the target area. They MUST be green. If they are red, stop and fix the tests (or the code) first. | ||
| 2. **Small Steps**: Make one structural change at a time (e.g., extract a method). | ||
| 3. **Run Tests Immediately**: Run the tests immediately after the single structural change. | ||
| 4. **Revert on Red**: If the tests fail, you made a mistake. Revert the change (`git checkout` or `ctrl+z`) and try a different approach. Do not attempt to "fix" the refactor while tests are failing. | ||
| 5. **Commit**: Once the small change is green, consider it a safe checkpoint. | ||
| This strict Red/Green/Refactor cycle prevents you from getting trapped in an uncompilable state. |
| --- | ||
| name: training-pipeline-debugging | ||
| description: "Diagnose NaN losses, out-of-memory errors, and shape mismatches in deep learning or ML pipelines." | ||
| --- | ||
| # Training Pipeline Debugging | ||
| ML training bugs are often silent mathematical errors rather than explicit code crashes. | ||
| ## Debugging Protocol | ||
| 1. **NaN Losses**: If loss goes to NaN, check: | ||
| - Learning rate too high? | ||
| - Missing data (NaNs in input)? | ||
| - Log/Exp/Divide by zero in custom loss functions? | ||
| - Exploding gradients (clip gradients)? | ||
| 2. **OOM (Out of Memory)**: | ||
| - Reduce batch size. | ||
| - Check for memory leaks in the training loop (e.g., accumulating history across epochs without `.detach()`). | ||
| 3. **Shape Mismatches**: | ||
| - Add temporary print statements or assertions asserting `tensor.shape` before matrix multiplications or loss calculations. | ||
| 4. **The Overfit Test**: The ultimate test of a pipeline is fitting a single batch. If the model cannot achieve near 0 loss on a single batch of 10 examples, the pipeline is fundamentally broken. Do not debug full runs until the single-batch test passes. |
| --- | ||
| name: tri-model-review | ||
| description: Multi-model orchestration — route to two external advisors, then synthesize | ||
| level: 5 | ||
| --- | ||
| # Tri-Model Review | ||
| Tri-model review routes through two external advisor CLIs, then synthesizes both outputs into one answer. | ||
| Use this when you want parallel external perspectives. | ||
| ## When to Use | ||
| - Backend/analysis + frontend/UI work in one request | ||
| - Code review from multiple perspectives (architecture + design/UX) | ||
| - Cross-validation where different models may disagree | ||
| - Fast advisor-style parallel input without full team runtime orchestration | ||
| ## Requirements | ||
| - Ensure you have configured the appropriate `apx ask-*` wrappers. | ||
| - If either wrapper is unavailable, continue with whichever provider is available and note the limitation. | ||
| ## How It Works | ||
| ```text | ||
| 1. Decompose the request into two advisor prompts: | ||
| - Analysis/architecture/backend prompt | ||
| - UX/design/docs/alternatives prompt | ||
| 2. Run both advisors via the canonical wrappers: | ||
| - apx ask-codex "<prompt>" | ||
| - apx ask-gemini "<prompt>" | ||
| 3. Synthesize both outputs into one final response | ||
| ``` | ||
| ## Execution Protocol | ||
| When invoked, follow this workflow: | ||
| ### 1. Decompose Request | ||
| Split the user request into: | ||
| - **Architecture prompt:** correctness, backend, risks, test strategy | ||
| - **UX prompt:** content clarity, alternatives, edge-case usability, docs polish | ||
| - **Synthesis plan:** how to reconcile conflicts | ||
| ### 2. Invoke advisors via Bash | ||
| Run both advisors via the Bash tool: | ||
| ```bash | ||
| apx ask-codex "<architecture prompt>" | ||
| apx ask-gemini "<UX prompt>" | ||
| ``` | ||
| ### 3. Synthesize | ||
| Return one unified answer with: | ||
| - Agreed recommendations | ||
| - Conflicting recommendations (explicitly called out) | ||
| - Chosen final direction + rationale | ||
| - Action checklist | ||
| ## Fallbacks | ||
| If one provider is unavailable: | ||
| - Continue with available provider + synthesis | ||
| - Clearly note missing perspective and risk | ||
| If both unavailable: | ||
| - Fall back to a single-model answer and state external advisors were unavailable. |
+1
-1
| { | ||
| "name": "agent-powerups", | ||
| "version": "0.5.0", | ||
| "version": "0.5.1", | ||
| "description": "Local-first CLI for browsing, validating, running, and explicitly writing agent powerups.", | ||
@@ -5,0 +5,0 @@ "license": "Apache-2.0", |
+42
-2
@@ -252,5 +252,10 @@ <p align="center"> | ||
| - `agent-harness-design` | ||
| - `agent-readable-docs` | ||
| - `agent-runtime-patterns` | ||
| - `agent-session-forensics` | ||
| - `ai-regression-testing` | ||
| - `ai-slop-cleaner` | ||
| - `api-doc-review` | ||
| - `architecture-decision-records` | ||
| - `architecture-simplification` | ||
| - `ask-claude` | ||
@@ -260,2 +265,3 @@ - `ask-codex` | ||
| - `autonomous-delivery-pipeline` | ||
| - `baseline-comparison-review` | ||
| - `bigquery-cost-audit` | ||
@@ -266,27 +272,51 @@ - `brainstorming` | ||
| - `build-fix-minimal-diff` | ||
| - `canonical-advisor-routing` | ||
| - `change-impact-check` | ||
| - `changelog-generator` | ||
| - `ci-failure-readout` | ||
| - `codebase-migration-batches` | ||
| - `context-compression` | ||
| - `context-docs` | ||
| - `context-minimization` | ||
| - `context-retrieval-loop` | ||
| - `data-quality` | ||
| - `dataset-split-review` | ||
| - `dbt-incremental-strategy-audit` | ||
| - `dbt-preflight` | ||
| - `dbt-strategy` | ||
| - `dead-code-removal` | ||
| - `defuddle` | ||
| - `dependency-cleanup` | ||
| - `deploy-pipeline-runbook` | ||
| - `dispatching-parallel-agents` | ||
| - `doc-consistency-check` | ||
| - `environment-doctor` | ||
| - `experiment-tracking-review` | ||
| - `failure-triage` | ||
| - `filesystem-mcp-guardrails` | ||
| - `finishing-a-development-branch` | ||
| - `flaky-test-investigation` | ||
| - `gh-address-comments` | ||
| - `github-ci-failure-triage` | ||
| - `graphify` | ||
| - `handoff-discipline` | ||
| - `handoff-documentation` | ||
| - `hard-won-skill-extractor` | ||
| - `incident-readout` | ||
| - `incremental-migration` | ||
| - `json-canvas` | ||
| - `local-rag-mcp` | ||
| - `log-driven-diagnosis` | ||
| - `managed-codebase-context` | ||
| - `markitdown-file-intake` | ||
| - `mcp-server-builder` | ||
| - `memory-build-workflow` | ||
| - `memory-optimization-workflow` | ||
| - `memory-query-workflow` | ||
| - `metric-impact-analyzer` | ||
| - `minimal-reproduction` | ||
| - `ml-leakage-check` | ||
| - `model-evaluation-reporting` | ||
| - `model-routing` | ||
| - `naming-and-structure-cleanup` | ||
| - `no-fluff` | ||
@@ -297,5 +327,8 @@ - `parallel-execution-engine` | ||
| - `pr-triage` | ||
| - `pre-release-verification` | ||
| - `prompt-evaluation-runner` | ||
| - `readme-hardening` | ||
| - `receiving-code-review` | ||
| - `red-team-eval-authoring` | ||
| - `regression-bisecting` | ||
| - `relay-claude` | ||
@@ -306,14 +339,22 @@ - `relay-codex` | ||
| - `repo-map` | ||
| - `reproducible-training-runs` | ||
| - `requesting-code-review` | ||
| - `requirements-clarifier` | ||
| - `review-comment-style-mining` | ||
| - `risk-based-review` | ||
| - `safe-refactor` | ||
| - `search-before-building` | ||
| - `semantic-layer-change-review` | ||
| - `skill-authoring-guide` | ||
| - `skill-evaluation-workbench` | ||
| - `sql-business-logic-review` | ||
| - `strategic-context-compaction` | ||
| - `structured-code-search-mcp` | ||
| - `subagent-team-orchestration` | ||
| - `systematic-debugging` | ||
| - `task-intake` | ||
| - `test-driven-development` | ||
| - `test-preserving-refactor` | ||
| - `training-pipeline-debugging` | ||
| - `tri-model-review` | ||
| - `using-git-worktrees` | ||
@@ -323,3 +364,2 @@ - `using-powerups` | ||
| - `webapp-visual-testing` | ||
| - `worktree-session-manager` | ||
| - `writing-plans` | ||
@@ -613,2 +653,2 @@ - `writing-skills` | ||
| Roadmap: [`docs/roadmap.md`](./docs/roadmap.md) | ||
| Roadmap: [`roadmap.md`](./docs/roadmap.md) |
Sorry, the diff of this file is too big to display
AI-detected potential code anomaly
Supply chain riskAI has identified unusual behaviors that may pose a security risk.
Found 4 instances
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
AI-detected potential code anomaly
Supply chain riskAI has identified unusual behaviors that may pose a security risk.
Found 4 instances
URL strings
Supply chain riskPackage contains fragments of external URLs or IP addresses, which the package may be accessing at runtime.
1995740
3.76%693
6.29%16390
4.44%648
6.58%