🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more →

agent-powerups

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

agent-powerups - npm Package Compare versions

Comparing version

0.5.0

0.5.1

+141

skills/agent-readable-docs/SKILL.md

		---
		name: agent-readable-docs
		description: Use when writing technical documentation that needs to be readable by both humans and AI models, converting existing docs to HADS format, validating a HADS document, or optimizing documentation for token-efficient AI consumption.
		---

		# Human-AI Document Standard (HADS)

		---

		## AI READING INSTRUCTION

		This skill teaches the agent how to read, generate, and validate HADS documents.
		Read all `[SPEC]` blocks before responding to any HADS-related request.
		Read `[NOTE]` blocks if you need context on intent or edge cases.

		---

		## 1. WHAT IS HADS

		[SPEC]
		- HADS = Human-AI Document Standard
		- Convention for Markdown technical documentation
		- Four block types: `[SPEC]`, `[NOTE]`, `[BUG]`, `[?]`
		- Every HADS document requires: H1 title, version declaration, AI manifest
		- AI manifest appears before first content section, tells AI what to read/skip
		- File extension: `.md` — standard Markdown, no tooling required

		---

		## 2. BLOCK TYPES

		[SPEC]
		```
		[SPEC] Authoritative fact. Terse. Bullet lists, tables, code. AI reads always.
		[NOTE] Human context, history, examples. AI may skip.
		[BUG] Verified failure + fix. Required fields: symptom, cause, fix. Always read.
		[?] Unverified / inferred. Lower confidence. Always flagged.
		```

		Block tag rules:
		- Bold, on its own line: `[SPEC]`
		- Content follows immediately (no blank line between tag and content)
		- Multiple blocks of different types allowed per section
		- Titled BUG blocks allowed: `[BUG] Short description`
		- No nesting of blocks inside blocks

		---

		## 3. REQUIRED DOCUMENT STRUCTURE

		[SPEC]
		```markdown
		# Document Title
		Version X.Y.Z · Author · Date · [metadata]

		---

		## AI READING INSTRUCTION

		Read `[SPEC]` and `[BUG]` blocks for authoritative facts.
		Read `[NOTE]` only if additional context is needed.
		`[?]` blocks are unverified — treat with lower confidence.

		---

		## 1. First Section

		[SPEC]
		...
		```

		Required elements in order:
		1. H1 title
		2. Version block in header
		3. AI manifest section before first content section
		4. Content sections (H2), subsections (H3)

		---

		## 4. HOW AI READS HADS

		[SPEC]
		When encountering a HADS document:
		1. Find and read the AI manifest first
		2. Read all `[SPEC]` blocks — these are ground truth
		3. Read all `[BUG]` blocks — always, before generating any code or config
		4. Read `[NOTE]` blocks only if `[SPEC]` is insufficient to answer the query
		5. Treat `[?]` content as hypothesis — note uncertainty in response

		Token optimization: for large documents, scan section headings first, then read only `[SPEC]` and `[BUG]` blocks in relevant sections.

		---

		## 5. HOW TO GENERATE HADS

		[SPEC]
		When asked to write documentation in HADS format:

		1. Start with header block (title, version, metadata)
		2. Add AI manifest — always include, never skip
		3. Organize content into numbered H2 sections
		4. For each fact: write as `[SPEC]` — terse, bullet or table or code
		5. For each "why" or context: write as `[NOTE]`
		6. For each known failure mode with confirmed fix: write as `[BUG]`
		7. For each unverified claim: write as `[?]`
		8. End with changelog section

		Content rules for `[SPEC]`:
		- Prefer bullet lists over prose
		- Prefer tables for multi-field facts
		- Prefer code blocks for syntax, formats, examples
		- Maximum 2 sentences of prose — if more needed, move to `[NOTE]`

		Content rules for `[BUG]`:
		- Always include: symptom, cause, fix
		- Optional: affected versions, workaround
		- Title on same line: `[BUG] Short description`

		[NOTE]
		When converting existing documentation to HADS: extract facts into `[SPEC]`, move narrative and history to `[NOTE]`, surface all known issues as `[BUG]`. Do not duplicate content between block types.

		---

		## 6. VALIDATION RULES

		[SPEC]
		A valid HADS document must have:
		- H1 title
		- Version in header
		- AI manifest before first content section
		- All block tags bold
		- `[BUG]` blocks contain at minimum symptom + fix

		---

		## 7. DESIGN INTENT

		[NOTE]
		HADS exists because AI models increasingly read documentation before humans do. The format optimizes for this reality without sacrificing human readability.

		Key insight: the AI manifest is the core innovation. It lets the model know what to read and what to skip — without requiring it to reason about document structure. Explicit is better than implicit for model consumption.

+104

skills/ai-regression-testing/SKILL.md

		---
		name: ai-regression-testing
		description: Deterministic checks first, agent review second, regression test for every real bug fixed or document why not. Targets the blind spot where an agent writes and reviews its own code.
		---

		# AI Regression Testing

		When an agent writes code and then reviews it, it carries the same assumptions into both steps. Automated tests break this cycle.

		## When to Use

		- An agent has modified logic, API routes, or data transformation code
		- A bug was found — need to prevent re-introduction
		- Running `/bug-check` after a change session
		- Multiple execution paths exist (feature flags, sandbox vs production, env variants)

		## The Core Problem

		```
		Agent writes fix → Agent reviews fix → Agent says "looks correct" → Bug still present
		```

		The most common blind spot: an agent fixes the production path but leaves the sandbox/mock path unchanged, or vice versa.

		## Workflow

		Run in order. Do not skip to agent review if automated steps fail.

		### Step 1 — Run Tests (mandatory)

		```bash
		npm test # or: pytest, cargo test, go test ./...
		npm run build # TypeScript build / type check
		```

		- Test fail → highest priority; fix before anything else
		- Build fail → report type errors as highest priority
		- Both pass → continue to Step 2

		### Step 2 — Agent Code Review

		With tests passing, do a focused review for patterns agents commonly miss:

		1. Execution path parity: Do all code paths (sandbox, production, feature-flag on/off) return the same response shape?
		2. Query completeness: Are all fields used in the response present in the query or selection?
		3. Error state cleanup: On error, is stale state cleared before the error is surfaced?
		4. Optimistic update rollback: If an API call fails, is the optimistic UI change reverted?

		### Step 3 — Write a Regression Test for Each Bug Fixed

		For every bug found and fixed, add a test immediately:

		```
		Bug: <description>
		File: <path>
		Regression test: <test name and what it asserts>
		```

		If you cannot write a test, document why:
		```
		Bug: <description>
		Regression test: DEFERRED — <reason> (e.g., requires E2E harness not yet in place)
		```

		Do not silently skip. Every real bug should either have a test or an explicit deferral note.

		## Writing Effective Regression Tests

		Test the contract, not the implementation:

		```typescript
		// Test what the consumer receives, not how it's computed
		const REQUIRED_RESPONSE_FIELDS = ["id", "email", "settings", "created_at"];

		it("profile endpoint returns all required fields", async () => {
		const res = await GET(createRequest("/api/user/profile"));
		const json = await res.json();
		for (const field of REQUIRED_RESPONSE_FIELDS) {
		expect(json.data).toHaveProperty(field);
		}
		});
		```

		Name tests after the bug category, not the fix:

		```typescript
		it("sandbox path returns same field set as production path (BUG-CLASS: path-parity)")
		it("notification_settings is not undefined after SELECT * removal (regression)")
		```

		## Common AI Regression Patterns

		\| Pattern \| Check \| Priority \|
		\|---------\|-------\|----------\|
		\| Execution path parity \| Same response shape across all paths \| High \|
		\| Query field omission \| All response fields present in DB query \| High \|
		\| Error state leakage \| State cleared before error is returned \| Medium \|
		\| Missing rollback \| Previous state restored on API failure \| Medium \|

		## Strategy

		Do not aim for coverage percentage. Write tests only for bugs that were found. Bug clusters naturally: if three bugs appeared in `/api/user/profile`, that endpoint needs tests. An endpoint that has never had a bug does not need tests yet.

		Tests added this way grow organically with the bug history and cannot be gamed by coverage metrics.

+15

skills/api-doc-review/SKILL.md

		---
		name: api-doc-review
		description: "Verify that API endpoints match their OpenAPI/Swagger specifications."
		---

		# API Doc Review

		Outdated API documentation causes integration failures. The code is the source of truth, and the docs must match.

		## Review Protocol

		1. Compare the route definition (e.g., `POST /users`) with the documented endpoint.
		2. Verify that all required request parameters (body, query, params) are documented with correct types.
		3. Verify that all possible response status codes (200, 400, 404, 500) and their payloads match the actual error handlers and return statements.
		4. If there is a mismatch, update the OpenAPI spec or inline documentation immediately. Do not defer it.

+17

skills/architecture-decision-records/SKILL.md

		---
		name: architecture-decision-records
		description: "Record why an architectural choice was made to prevent agents or humans from unintentionally reverting it."
		---

		# Architecture Decision Records (ADR)

		Code tells you how a system works. ADRs tell you why it works that way, preventing future maintainers (and AI agents) from suggesting "improvements" that were already tried and discarded.

		## ADR Protocol

		When finalizing a major design decision (e.g., "Choosing Postgres over MongoDB", "Using custom event bus over Redis"):
		1. Create `docs/adr/YYYY-MM-DD-<short-title>.md`.
		2. Include the Context (what is the problem?).
		3. Include the Decision (what are we doing?).
		4. Include the Consequences (what trade-offs are we accepting?).
		5. Keep it under 300 words. Focus on constraints, not theory.

+21

skills/architecture-simplification/SKILL.md

		---
		name: architecture-simplification
		description: "Use to collapse over-engineered abstractions, remove unnecessary layers, or consolidate redundant logic."
		---

		# Architecture Simplification

		Over time, codebases accumulate "just in case" abstractions. This skill guides the safe removal of unnecessary complexity.

		## Simplification Rules

		1. Identify the Abstraction Cost: Does this interface have only one implementation? Does this wrapper class just pass arguments straight through?
		2. Inline the Logic: Move the logic from the unnecessary abstraction directly into the caller.
		3. Delete the Dead Code: Remove the interface, wrapper, or factory that is no longer needed.
		4. Test Verification: Ensure the observable behavior of the system has not changed.

		## Anti-Pattern
		Do not rewrite the entire subsystem. Simplification means removing the noise around the core logic, not changing the core logic itself.

		Example:
		If a `UserRepository` implements `IUserRepository` but there is only ever one database, inline `UserRepository` and delete `IUserRepository`.

+18

skills/baseline-comparison-review/SKILL.md

		---
		name: baseline-comparison-review
		description: "Ensure that new complex models actually outperform simple, naive baselines."
		---

		# Baseline Comparison Review

		Machine learning models add massive technical debt. You must constantly justify their existence by comparing them to a "dumb" baseline.

		## Review Protocol

		1. Define the Naive Baseline:
		- For classification: Predict the majority class.
		- For regression: Predict the mean or median of the training target.
		- For time series: Predict the last known value (naive persistence).
		2. Define the Heuristic Baseline: What simple `if/else` rule would a domain expert write?
		3. Evaluate the Delta: If the complex Deep Learning model only beats the heuristic baseline by 1%, recommend keeping the heuristic. The complexity is not worth the maintenance cost.
		4. Action: Always demand a baseline evaluation script before approving a new model architecture.

+36

skills/canonical-advisor-routing/SKILL.md

		---
		name: canonical-advisor-routing
		description: Process-first advisor routing with artifact capture
		---

		# Canonical Advisor Routing

		Route a prompt through a local provider CLI and persist the result as an artifact.

		## Usage

		Use the provided command wrappers:

		```bash
		apx ask-codex "review this patch from a security perspective"
		apx ask-gemini "suggest UX improvements for this flow"
		apx ask-claude "draft an implementation plan for issue #123"
		```

		## Routing

		Required execution path:

		Invoke the provider CLI via the canonical `apx ask-*` wrappers. Do not manually assemble raw provider CLI commands unless debugging the wrapper.

		## Requirements

		- The selected local CLI must be installed and authenticated.

		## Artifacts

		Write the response to the standard artifact location:

		```text
		.agent-powerups/artifacts/ask/<provider>-<slug>-<timestamp>.md
		```

+15

skills/change-impact-check/SKILL.md

		---
		name: change-impact-check
		description: "Use before submitting a PR or considering a task done to evaluate the 'blast radius' of your changes."
		---

		# Change Impact Check

		Code changes rarely exist in isolation. Before declaring success, you must evaluate the downstream consequences of your work.

		## Impact Assessment Protocol

		1. API Surface: Did you change a public method signature, REST endpoint, or database schema? If so, immediately `grep` the entire repository for usages of the old signature.
		2. Dependency Graph: If you updated a core utility function (e.g., `formatDate`), find all modules that import it. Do their tests still pass?
		3. Configuration: Did you add a new environment variable? Ensure it is documented in `.env.example` or the README.
		4. Action: If you detect a high blast radius, run the full test suite (not just the local unit tests) and explicitly document the affected areas in your handoff or PR description.

+15

skills/ci-failure-readout/SKILL.md

		---
		name: ci-failure-readout
		description: "Use when a CI pipeline fails to extract the actual error from thousands of lines of logs."
		---

		# CI Failure Readout

		CI logs are notoriously noisy. Do not dump the entire log into the context window.

		## Readout Protocol

		1. Locate the True Error: Search the CI log (using the UI or by downloading and `grep`ing it) for the exact step that failed. Ignore setup/teardown noise.
		2. Extract the Trace: Copy only the stack trace or the specific compiler/linter error message.
		3. Reproduce Locally: The first rule of fixing a CI failure is proving it fails locally. Run the exact command the CI runner used (e.g., `npm run test:e2e`).
		4. Draft the Readout: Before fixing it, write a 2-sentence summary: "CI failed during the `build` step because `src/types.ts` is missing an export." This forces you to understand the problem instead of blindly guessing.

+15

skills/context-docs/SKILL.md

 ---
 name: context-docs
 description: "Maintain short, focused Markdown files per subsystem to provide agents with isolated context."
 ---
 # Context Docs
 Large centralized documentation files consume too much context window. Decentralized, module-specific context docs provide targeted information exactly when an agent needs it.
 ## Context Protocol
 . Place README or CONTEXT docs *inside* specific subsystem directories (e.g., `src/auth/CONTEXT.md`).
 . Document only the boundaries: How does this module communicate with the rest of the app? What are its critical invariants?
 . Keep it terse. Use bullet points and exact file paths.
 . Update these files inline when refactoring the module.

+17

skills/context-minimization/SKILL.md

 ---
 name: context-minimization
 description: "Use continuously during long tasks. Teaches how to read less, output less, and keep the LLM context window lean and fast."
 ---
 # Context Minimization
 Your context window is the most precious resource. Large contexts make you slow, expensive, and prone to hallucinations.
 ## The Rules of Lean Context
 . **Surgical Reads**: Never use `cat` or `read_file` on a 2,000-line file without `start_line` and `end_line`. Always use `grep` first to find the relevant line numbers.
 . **Silent Commands**: Always append `--silent`, `-q`, or redirect stderr/stdout to `/dev/null` for commands that produce massive logs (like `npm install` or verbose builds) unless you specifically need to debug them.
 . **Pagination**: Disable pagers for all terminal tools. E.g., `git --no-pager log`.
 . **Terse Responses**: Do not explain what a tool does before calling it, unless safety requires it. Do not repeat the user's instructions back to them verbatim.
 . **Close Files**: Once you are done looking at a file, stop referring to it.
 . **Parallel Ops**: If you need to search 3 files, run 3 parallel grep/read calls in a single turn instead of sequentially. This saves turns, which saves context repetition.

+15

skills/dataset-split-review/SKILL.md

		---
		name: dataset-split-review
		description: "Audit the methodology used to split data into train, validation, and test sets."
		---

		# Dataset Split Review

		A random split is often the wrong split. Incorrect splitting causes massive overestimation of model performance.

		## Review Protocol

		1. Time-Series Data: If the data has a time component, `train_test_split` is strictly forbidden. You must use a chronological split to prevent the model from learning the future.
		2. Group Leakage: If the dataset has multiple rows for a single user/patient/session, a standard split will put rows from the same user in both train and test. You must use GroupKFold or group-based splitting.
		3. Stratification: For imbalanced datasets, verify that stratification is used to maintain the target distribution across all splits.
		4. Action: Review the splitting code and explicitly verify Time, Group, and Stratification safety.

+16

skills/dead-code-removal/SKILL.md

		---
		name: dead-code-removal
		description: "Use to identify and safely delete unused functions, classes, exports, and files."
		---

		# Dead Code Removal

		Dead code increases maintenance overhead and confuses developers.

		## The Removal Protocol

		1. Verify Unused: Before deleting anything, you must search the entire repository to ensure the symbol or file is truly unused. Do not assume it is dead just because the current file doesn't use it.
		2. Check for Dynamic Invocation: Be wary of dynamically invoked code (e.g., reflection, dependency injection by string name, ORM mappers). If there is any doubt, leave it alone or ask the user.
		3. Delete Aggressively: Once confirmed unused, delete the code. Do not comment it out.
		4. Prune Dependencies: If you delete the only code that was using an imported module, remove the import statement as well.
		5. Run Tests: Always run tests and/or type checkers (e.g., `tsc --noEmit`) after removal to ensure you didn't accidentally break a hidden dependency.

+16

skills/dependency-cleanup/SKILL.md

		---
		name: dependency-cleanup
		description: "Use to audit and remove unused or redundant third-party dependencies from package manifests."
		---

		# Dependency Cleanup

		Bloated dependencies slow down builds, increase security surface area, and complicate updates.

		## Cleanup Protocol

		1. Audit: Review package manifests such as package JSON, requirements text, or Cargo manifests.
		2. Verify Usage: For any suspect dependency, perform a global search across the codebase (e.g., `import .* from 'lodash'`).
		3. Remove: If there are zero usages, use the native package manager command to remove it (e.g., `npm uninstall lodash` or `pip uninstall ...`). Do not just manually edit the manifest unless absolutely necessary, to ensure lockfiles are updated correctly.
		4. Consolidate: If multiple libraries serve the exact same purpose (e.g., `moment` and `date-fns`), flag it to the user for future consolidation. Do not attempt a massive library migration autonomously.
		5. Validate: Run the build and test suite to ensure the removed dependency wasn't implicitly required by a build script or runtime environment.

+15

skills/doc-consistency-check/SKILL.md

		---
		name: doc-consistency-check
		description: "Audit documentation for broken file paths, outdated commands, and renamed variables."
		---

		# Doc Consistency Check

		Documentation rots when code changes. This skill identifies stale references in Markdown files.

		## Consistency Protocol

		1. Grep markdown files (`.md`) for file paths (e.g., `src/components/Button.tsx`).
		2. Verify that those files still exist in the repository. If not, the documentation is stale.
		3. Check code blocks in documentation. Do the function names and variable names still match the actual source code?
		4. Flag broken links and outdated references for immediate correction.

+15

skills/experiment-tracking-review/SKILL.md

		---
		name: experiment-tracking-review
		description: "Verify that all hyperparameters, metrics, and data references are properly logged."
		---

		# Experiment Tracking Review

		An ML experiment is useless if you cannot reconstruct exactly how it was run and what data it used.

		## Review Protocol

		1. Hyperparameter Logging: Ensure the script logs every hyperparameter (learning rate, batch size, architecture details). Hardcoded magic numbers in the script must be extracted to a config and logged.
		2. Metric Logging: Verify that training and validation metrics are logged at each epoch or step, not just at the end.
		3. Artifact Saving: Ensure the final model weights, preprocessing scalers/encoders, and the exact configuration file are saved together in a versioned directory or tracking system.
		4. Action: Do not allow training scripts to print metrics to stdout only. Enforce structured logging (JSON, MLflow, wandb).

+24

skills/failure-triage/SKILL.md

		---
		name: failure-triage
		description: "Use when confronted with an unknown failure in CI or production to rapidly categorize the issue before deep debugging."
		---

		# Failure Triage

		Before diving deep into a stack trace or spending hours reproducing a bug, you must triage it to determine the blast radius, subsystem, and debugging approach.

		## The Triage Process

		1. Categorize the Failure:
		- Is it a Syntax/Build Error? (Fails before running)
		- Is it a Logic Error? (Runs, but produces wrong output)
		- Is it an Infrastructure/Environment Error? (Network timeout, missing DB table)
		- Is it a Flaky/Non-deterministic Error? (Fails sometimes)
		2. Locate the Origin:
		- Scan the stack trace. Ignore framework/library internals. Find the highest frame that belongs to the first-party application code.
		3. Check Recent Changes:
		- Run `git log -n 5 --oneline` and `git diff` to see what changed recently. Most bugs are in the newest code.
		4. Formulate a Hypothesis:
		- State clearly: "I suspect this is an environment error caused by missing configuration, originating in `src/config.ts`."

		Do not start writing fixes until you have explicitly stated your triage hypothesis and confirmed the category.

+19

skills/flaky-test-investigation/SKILL.md

		---
		name: flaky-test-investigation
		description: "Use to diagnose tests that pass and fail intermittently without code changes."
		---

		# Flaky Test Investigation

		Flaky tests erode trust in CI. Do not just re-run them and hope for the best.

		## Investigation Protocol

		1. Isolate the Test: Run the specific failing test by itself. If it passes, the flake is likely an order dependency or state leakage from a previous test.
		2. Stress Test: Run the test in a tight loop (e.g., `for i in {1..100}; do npm test -- -t "My Test"; done`).
		3. Check for Common Vectors:
		- Time: Does the test rely on `Date.now()` or `setTimeout`? Mock the clock.
		- Async/Promises: Are we asserting before a background task finishes? Ensure proper `await` or `waitFor` usage.
		- Shared State: Are we reusing database records, global singletons, or mutated variables between runs? Ensure clean teardowns in `afterEach`.
		- Randomness: Does the test rely on random IDs or sorts? Force deterministic seeds or sort orders.
		4. Prove the Fix: Do not just guess. The fix must be verified by running the stress test loop again and achieving a 100% pass rate.

+32

skills/handoff-discipline/SKILL.md

		---
		name: handoff-discipline
		description: "Use when completing a task or running out of context limit. Ensures the next session or human engineer has exactly what they need to resume work instantly."
		---

		# Handoff Discipline

		When ending a session, handing a task back to the user, or preparing to swap to a new context window, you must leave a clean paper trail.

		## The Handoff Rules

		1. State the End Condition: Explain exactly why you are stopping (e.g., "Task complete", "Blocked on PR", "Context window too large").
		2. Leave a Breadcrumb: If the task is incomplete, summarize the last successful step, the current failing step, and the exact next command to run.
		3. Commit or Stash: Ensure the working directory is clean. Either commit the work, tell the user to commit, or stash it. Do not leave unverified messy state.
		4. Link the Work: Provide file paths to the modified files or generated artifacts so the next agent/user doesn't have to search for them.

		## The Handoff Summary Format
		When creating a handoff summary file, use this exact structure:

		```markdown
		### 1. Goal
		[1-2 sentences on what we were trying to do]

		### 2. State
		- ✅ Completed: [What works]
		- 🚧 In Progress: [What is broken or partial]
		- 🛑 Blockers: [What stopped us]

		### 3. Next Steps
		1. Run `npm test ...`
		2. Fix the error in `src/foo.ts` around line X.
		```

+15

skills/handoff-documentation/SKILL.md

		---
		name: handoff-documentation
		description: "Write state-restoration documents for passing tasks between agents or engineers."
		---

		# Handoff Documentation

		When a session ends, the context window is destroyed. Handoff docs serialize the necessary state to allow immediate resumption without re-reading the entire codebase.

		## Handoff Protocol

		1. Write a handoff document before concluding the task.
		2. Current State: What exactly is broken or unfinished? (e.g., "Test X in foo.spec.ts is failing with Error Y").
		3. Next Action: Provide the exact terminal command the next agent/human should run to see the failure.
		4. Discovered Constraints: Note any dead ends encountered so the next session doesn't repeat the mistake (e.g., "Tried using Library Z, but it doesn't support async").

+24

skills/incident-readout/SKILL.md

		---
		name: incident-readout
		description: "Use after fixing a bug to generate a blameless post-mortem summary for human review."
		---

		# Incident Readout

		When a complex debugging session ends, you must produce an incident readout. This prevents knowledge loss and helps humans review the fix quickly.

		## Format

		Output an incident readout document (or print to terminal) using this structure:

		### 1. The Symptom
		What was reported? (1-2 sentences)

		### 2. The Root Cause
		What was the actual underlying technical reason for the failure? Be highly specific about the exact line of code, assumption, or state that failed.

		### 3. The Fix
		What did we change to fix it? Provide a high-level summary of the structural change, not just a diff.

		### 4. Prevention
		How do we ensure this never happens again? (e.g., "Added test case X", "Refactored module Y to be strongly typed").

+19

skills/incremental-migration/SKILL.md

		---
		name: incremental-migration
		description: "Use when migrating APIs, libraries, or patterns across a large codebase. Ensures safe, step-by-step progress rather than risky mega-commits."
		---

		# Incremental Migration

		Never attempt to migrate an entire codebase in a single step. Mega-commits are impossible to review and dangerous to merge.

		## The Incremental Strategy

		1. Define the Target Pattern: Clearly establish the "Old Way" and the "New Way".
		2. Implement Side-by-Side: Create the "New Way" implementation alongside the old one. Do not delete the old one yet.
		3. Migrate One Vertical Slice: Pick exactly one feature, route, or component. Update it to use the new pattern.
		4. Test and Commit: Verify the slice works. Commit this step.
		5. Repeat: Move to the next slice.
		6. Deprecate and Remove: Only once all usages of the "Old Way" are gone can you safely delete the old implementation.

		If a migration is too large for a single session, leave a clear handoff document summarizing progress and the next files to migrate.

+15

skills/log-driven-diagnosis/SKILL.md

		---
		name: log-driven-diagnosis
		description: "Use when debugging complex runtime failures, distributed systems, or issues where a local debugger cannot be attached."
		---

		# Log-Driven Diagnosis

		When you cannot step through code, logs are your only visibility. You must be methodical in how you extract signals from noise.

		## Protocol

		1. Time-Bound Search: Never dump the whole log file. Always `grep` for timestamps around the reported incident, or use tail.
		2. Identify the Request ID: If the system uses distributed tracing or request IDs, find the ID associated with the error, then `grep` the entire log corpus for only that ID to trace the complete lifecycle of the failed request.
		3. Look for Preceding Warnings: The `ERROR` log is usually just the final crash. The actual root cause is often a `WARNING` or unexpected `INFO` log that occurred milliseconds earlier (e.g., a connection retry failing, or an empty array being returned).
		4. Add Missing Logs: If the logs do not provide enough visibility, your first action must be to add temporary logging to the application, reproduce the bug, and gather the new signals. Do not guess blindly if the logs are insufficient.

+72

skills/memory-build-workflow/SKILL.md

		---
		name: memory-build-workflow
		description: Use when a user needs to build or refresh persistent graph memory from a mixed corpus and the right path may include graphify, incremental update, or helper conversion before ingestion.
		---

		# Memory Build Workflow

		## Overview

		Build persistent graph memory with `graphify`.

		Use helper tools only when source format would otherwise reduce graph quality or waste context.

		## When to Use

		- first graph build for a repo, notes folder, research corpus, or mixed raw folder
		- corpus changed enough that persistent graph memory is worth refreshing
		- input includes PDFs, Office docs, or noisy web pages that should be normalized before graph build
		- user wants durable graph outputs instead of one-shot file reading

		Do not use for:
		- one small plain-text file or a narrow one-off question
		- cases where an existing graph already answers the question better via query

		## Required Checks

		```powershell
		apx check graphify
		apx check markitdown-file-intake
		apx check defuddle
		```

		Stop and report missing tools. Do not auto-install without approval.

		## Routing

		\| Situation \| Action \|
		\|---\|---\|
		\| ready local corpus of readable files \| run `graphify` \|
		\| existing graph plus changed sources \| run `graphify --update` \|
		\| PDF, Office doc, slide deck, or similar hard-to-read format \| convert with `markitdown-file-intake`, then build with `graphify` \|
		\| article or noisy web page \| clean with `defuddle`, then build with `graphify` \|
		\| user wants vault browsing after build \| offer optional Obsidian export \|

		## Core Rules

		- `graphify` is the primary engine
		- prefer `graphify --update` over full rebuild when a graph already exists
		- use helpers only to improve source readability before graph ingestion
		- keep Obsidian optional and post-build
		- keep source provenance intact when converting inputs

		## Minimal Workflow

		1. Check whether a usable graph already exists.
		2. If it exists and sources changed, prefer `graphify --update`.
		3. If sources are noisy or binary, normalize them with the narrowest helper.
		4. Build or refresh with `graphify`.
		5. Offer query workflow next instead of rereading the corpus.

		## Common Failure Modes

		- missing `graphify`: stop and report; no fallback build path
		- rebuilding from scratch when update would work: unnecessary cost and churn
		- using helpers on already-readable Markdown or code: wasted step
		- treating Obsidian as required: wrong; it is optional output only

		## References

		- [`../graphify/UPSTREAM.md`](../graphify/UPSTREAM.md)
		- [`../../references/HELPER_TOOLS.md`](../../references/HELPER_TOOLS.md)
		- [`../../references/OBSIDIAN_EXPORT.md`](../../references/OBSIDIAN_EXPORT.md)

+74

skills/memory-optimization-workflow/SKILL.md

		---
		name: memory-optimization-workflow
		description: Use when deciding the lowest-cost context path for a mixed corpus, especially when choosing among direct reading, helper conversion, graph build, graph update, or graph query.
		---

		# Memory Optimization Workflow

		## Overview

		Minimize token spend, reread cost, and unnecessary rebuilds.

		`graphify` is the main optimization path for repeated work. Helper tools exist to make hard sources cheaper before graph or direct reading.

		## When to Use

		- mixed corpus and the cheapest inspection path is unclear
		- repeated questions over the same files
		- need to choose between direct read, conversion, graph build, update, or query
		- want to reduce repeated large-context rereads

		Do not use for:
		- tiny single-file questions where direct reading is already cheapest
		- cases where the user explicitly wants raw-file inspection only

		## Required Checks

		```powershell
		apx check graphify
		apx check markitdown-file-intake
		apx check defuddle
		```

		Stop and report missing tools. Do not auto-install without approval.

		## Fast Routing

		\| Situation \| Cheapest path \|
		\|---\|---\|
		\| small readable text corpus, one question \| read directly \|
		\| PDF, Office doc, or other binary-like source \| `markitdown-file-intake` \|
		\| noisy web page or article \| `defuddle` \|
		\| repeated questions across same corpus \| build with `graphify` \|
		\| existing graph plus changed sources \| `graphify --update` \|
		\| existing graph plus new question \| query graph first \|

		## Decision Rules

		- prefer direct reading for small plain-text scope
		- prefer Markdown over binary or chrome-heavy formats
		- prefer graph query over full reread when a graph already exists
		- prefer incremental update over rebuild
		- keep helper tools secondary to the main graph path
		- keep Obsidian optional; it is not part of the optimization decision unless the user wants vault browsing

		## Escalation Ladder

		1. Direct read if scope is already small and readable.
		2. Convert only if format is the main source of waste.
		3. Build graph memory when questions will repeat or corpus is broad.
		4. Update existing graph when sources changed.
		5. Query existing graph before any broad reread.

		## Common Failure Modes

		- building a graph for a tiny one-shot question
		- rereading large corpora after a graph already exists
		- converting already-readable Markdown or code
		- rebuilding instead of updating
		- making helper tools feel primary instead of supportive

		## References

		- [`../../references/HELPER_TOOLS.md`](../../references/HELPER_TOOLS.md)
		- [`../../references/GRAPHIFY_PROVENANCE.md`](../../references/GRAPHIFY_PROVENANCE.md)

+58

skills/memory-query-workflow/SKILL.md

		---
		name: memory-query-workflow
		description: Use when a graph already exists and the user needs retrieval, tracing, explanation, or gap detection from graph memory before reopening the full corpus.
		---

		# Memory Query Workflow

		## Overview

		Use existing graph memory first.

		Query the graph before rereading source files unless the graph is missing, stale, or too weak for the question.

		## Required Check

		```powershell
		apx check graphify
		```

		## Required State

		- existing `graphify-out/graph.json`

		## Routing

		\| Question shape \| Action \|
		\|---\|---\|
		\| broad question about connected concepts \| `graphify query` \|
		\| trace between two concepts, files, or systems \| `graphify path` \|
		\| explain one concept or node in context \| `graphify explain` \|
		\| no graph exists \| switch to `memory-build-workflow` \|
		\| graph exists but corpus changed \| recommend `graphify --update` before trusting results \|

		## Core Rules

		- prefer graph retrieval over full-corpus reread
		- say explicitly when the graph is missing, stale, sparse, or weakly matched
		- do not overclaim beyond what graph nodes and edges support
		- when graph coverage is weak, use the graph result to target the next direct read instead of restarting broad exploration

		## Minimal Workflow

		1. Confirm `graphify-out/graph.json` exists.
		2. Choose `query`, `path`, or `explain` based on question shape.
		3. Answer from graph evidence first.
		4. If result quality is weak, say why: missing graph, stale graph, low coverage, or weak node match.
		5. Escalate to build/update or targeted reread only when needed.

		## Common Failure Modes

		- skipping graph lookup and rereading everything
		- hiding that the graph is stale or incomplete
		- using `query` for a question that clearly needs a path trace
		- treating no-result output as proof the corpus lacks the concept

		## Reference

		- [`../graphify/UPSTREAM.md`](../graphify/UPSTREAM.md)

+16

skills/minimal-reproduction/SKILL.md

		---
		name: minimal-reproduction
		description: "Use to isolate a bug from a large application into a standalone, runnable script or single test case."
		---

		# Minimal Reproduction

		You cannot reliably fix what you cannot reliably reproduce in isolation.

		## The Subtraction Method

		1. Start with the Failure: Take the code path that fails.
		2. Remove the UI/Network: If the bug is reported via a web request, write a script that calls the internal controller directly.
		3. Mock Dependencies: If the bug doesn't require the database, mock it. If it doesn't require the third-party API, mock it.
		4. Prune Data: If the bug fails on a 10MB JSON payload, binary search the payload down to the exact 2 keys that trigger the failure.
		5. Final Output: The result must be a single file that relies on ZERO external state, can be run with a single command, and deterministically outputs the exact error reported.

+15

skills/ml-leakage-check/SKILL.md

		---
		name: ml-leakage-check
		description: "Identify and prevent target leakage in ML preprocessing pipelines."
		---

		# ML Leakage Check

		Target leakage is the most common and dangerous error in applied ML. It creates models that look perfect in validation but fail instantly in production.

		## Leakage Vectors to Check

		1. Global Scaling/Imputation: Did the author calculate the mean of the entire dataset to impute missing values before splitting? This leaks the test set distribution into the training set.
		2. Future Features: Is there a feature available in the training data that would absolutely not be available at the moment of prediction in real life? (e.g., using "surgery_outcome" to predict "hospital_admission_length").
		3. ID Proxies: Are database IDs or row numbers accidentally included as features? They often correlate with time or order of entry.
		4. Action: Enforce the rule: Split FIRST, then fit transformers on Train ONLY, then transform Train/Val/Test.

+15

skills/model-evaluation-reporting/SKILL.md

		---
		name: model-evaluation-reporting
		description: "Standardize the reporting of model metrics to ensure statistical rigor and business relevance."
		---

		# Model Evaluation Reporting

		Raw accuracy metrics are not enough. Evaluation must reflect the actual business impact and failure modes of the model.

		## Reporting Standards

		1. Beyond Accuracy: Demand the Confusion Matrix. Demand Precision, Recall, and F1. Explain the cost of a False Positive vs. a False Negative in the business context.
		2. Slice Analysis: Report performance on key segments. A model might be 95% accurate overall, but only 40% accurate on new users.
		3. Calibration: If the model outputs probabilities, verify if they are calibrated. A prediction of 0.8 should mean it happens 80% of the time.
		4. Action: Format the output as a Markdown report that a non-technical stakeholder can read, highlighting trade-offs and worst-case scenarios.

+16

skills/naming-and-structure-cleanup/SKILL.md

		---
		name: naming-and-structure-cleanup
		description: "Use to enforce consistent naming conventions and file structures across a project without changing business logic."
		---

		# Naming and Structure Cleanup

		Inconsistent naming (camelCase vs snake_case) and messy file structures make codebases hard to navigate.

		## Cleanup Rules

		1. Observe Local Conventions: Before renaming, scan the project to determine the dominant convention. If 80% of files use `camelCase`, enforce `camelCase`.
		2. Targeted Renames: Use the `safe-rename` command pattern to update variables, classes, or files. Ensure all imports are updated.
		3. File Co-location: Move files so that closely related logic is co-located (e.g., keeping `Button.tsx`, `Button.css`, and `Button.test.tsx` in the same directory).
		4. No Logic Changes: Do not refactor the internal logic of functions while performing naming cleanups. Keep the diff focused purely on structure and names.
		5. Verify: Run the project's type checker and test suite after every structural change.

+20

skills/pre-release-verification/SKILL.md

		---
		name: pre-release-verification
		description: "Use before tagging a release or deploying to production to ensure all quality gates have passed."
		---

		# Pre-Release Verification

		Releases must be deterministic and verified. No "hope driven" deployments.

		## Verification Checklist

		Before authorizing or participating in a release process, verify the following:

		1. Clean Working Tree: `git status` must be completely clean. No untracked files or uncommitted changes.
		2. Green CI: The latest commit on the main branch MUST have a passing CI pipeline.
		3. Lint & Types: Run the project's linter (`npm run lint`, `cargo clippy`, etc.) and type checker (`tsc --noEmit`). They must exit with 0.
		4. Test Gate: Run the full test suite locally if CI is not available or if requested.
		5. No Secrets: Ensure no API keys or credentials have been accidentally hardcoded or staged.

		If any check fails, the release is blocked. State the exact failure and stop.

+15

skills/readme-hardening/SKILL.md

		---
		name: readme-hardening
		description: "Ensure the project README provides immediate, exact commands for setup, testing, and deployment to help agents and humans bootstrap quickly."
		---

		# README Hardening

		A good README is an executable contract, not a marketing page. It must allow an agent or a new engineer to clone the repository and run tests within 3 minutes.

		## Hardening Protocol

		1. Verify Commands: Extract every shell command (`npm install`, `docker-compose up`, `cargo test`) from the README and run them in a clean environment. If they fail, fix the README.
		2. Remove Ambiguity: Replace "Install dependencies" with `npm ci`. Replace "Run the app" with `npm run start:dev`.
		3. Environment Checklist: Clearly list required environment variables in a `.env.example` block. Do not just say "set up your environment."
		4. Architecture Pointers: Provide exact file paths for entry points (e.g., "Main API routing is in `src/routes.ts`") to save agents from searching the entire tree.

+18

skills/regression-bisecting/SKILL.md

		---
		name: regression-bisecting
		description: "Use when a bug was recently introduced but you don't know which commit caused it."
		---

		# Regression Bisecting

		When a feature used to work but is now broken, do not guess what broke it. Use binary search through git history to find the exact commit.

		## Protocol

		1. Define the Test: You must have a single command that returns exit code `0` if good, and non-zero if bad. (e.g., `npm run test:repro` or `node repro.js`).
		2. Find a Known Good State: Ask the user or search git history for a commit where you are certain the feature worked.
		3. Find the Known Bad State: Typically `HEAD`.
		4. Bisect:
		- (For human workflows, guide them to use `git bisect start <bad> <good>`).
		- For agent workflows, manually check out the midpoint commit, run the test, and narrow the window.
		5. Analyze the Offending Commit: Once the exact commit is found, use `git show <commit>` to analyze the diff. The root cause is contained entirely within that diff.

+26

skills/reproducible-training-runs/SKILL.md

		---
		name: reproducible-training-runs
		description: Analyzes ML training scripts to enforce seed setting, deterministic operations, and environment tracking for exact reproducibility.
		---
		# Reproducible Training Runs

		Use this skill when reviewing or modifying ML training scripts to ensure they produce deterministic, reproducible results across runs.

		## Prerequisites

		- A target Python training script.

		## Instructions

		When applying this skill, check for and enforce the following reproducibility standards:

		1. Global Seed Initialization: Ensure a single function sets seeds for all relevant libraries (`random`, `numpy`, `torch`, `tensorflow`).
		2. Deterministic Algorithms: For PyTorch or TensorFlow, check if deterministic algorithms are enabled (e.g., `torch.use_deterministic_algorithms(True)`).
		3. Data Loading: Verify that data loaders use deterministic shuffling and that worker processes are seeded correctly to avoid identical augmentations.
		4. Environment & Config Tracking: Ensure that the script logs the exact configuration, dependency versions, and data hashes.

		## Safety & Style

		- Review First: Point out missing reproducibility guards before rewriting the script.
		- Keep it Explicit: Provide the exact snippet for seed initialization. Do not hide side effects.
		- Performance Trade-offs: Warn the user if enabling deterministic algorithms will significantly impact training speed.

+22

skills/risk-based-review/SKILL.md

		---
		name: risk-based-review
		description: "Use when reviewing code (or your own plan) to allocate attention based on the danger of the change."
		---

		# Risk-Based Review

		Not all code changes deserve the same level of scrutiny. A typo fix in a README is low risk; a change to the authentication middleware is critical.

		## Risk Categories

		1. Critical Risk (Auth, Payments, Cryptography, Database Migrations):
		- Require 100% test coverage for the change.
		- Require explicit human sign-off.
		- Look for edge cases, null pointers, and race conditions.
		2. High Risk (Core Business Logic, Shared Utilities, Public API changes):
		- Require unit and integration tests.
		- Check for backwards compatibility and blast radius (see `change-impact-check`).
		3. Low Risk (UI tweaks, isolated components, internal tools):
		- Focus on readability, naming conventions, and simple unit tests.

		When acting as a reviewer, explicitly state the Risk Category of the PR before providing feedback.

+29

skills/semantic-layer-change-review/SKILL.md

		---
		name: semantic-layer-change-review
		description: "Use when modifying dbt metrics or semantic models to ensure mathematical correctness and backwards compatibility."
		---

		# Semantic Layer Change Review

		Changes to the semantic layer directly impact dashboards and business reporting. A silent drift in a metric definition destroys trust.

		## Review Protocol

		1. Identify the Change Type:
		- Addition: Safe. (Adding a new metric or dimension).
		- Deprecation: Requires communication. (Removing a metric).
		- Modification: High Risk. (Changing the SQL expression, aggregation, or filters of an existing metric).

		2. Evaluate Mathematical Soundness:
		- Are we averaging an average?
		- Are we summing a distinct count?
		- Does adding this dimension cause a fan-out that inflates the metric?

		3. Check Backwards Compatibility:
		- If an existing metric's logic is changed, you MUST flag it. The recommended path is to use dbt's metric versioning or create a new metric (e.g., `revenue_v2`) rather than silently altering historical numbers.

		4. Verify Entity Mapping:
		- Ensure `entities` (primary/foreign keys) match the granularity of the underlying semantic model.

		## Anti-Pattern
		Do not approve a pull request that changes the `expr` of a core metric without explicitly confirming the business requested the restatement of historical data.

+50

skills/strategic-context-compaction/SKILL.md

 ---
 name: strategic-context-compaction
 description: Compact context at logical phase boundaries — after research, after planning, after debugging — rather than mid-task. Preserves useful state while clearing noise.
 ---
 # Strategic Context Compaction
 Compact at logical boundaries to preserve high-value context while clearing noise. Arbitrary or mid-task compaction loses critical state.
 ## When to Compact
 | Transition | Compact? | Reason |
 |-----------|----------|--------|
 | Research → Planning | **Yes** | Research context is bulky; the plan is the distilled output |
 | Planning → Implementation | **Yes** | Plan is saved in tasks/files; context is free to reset |
 | Implementation → Testing | **Maybe** | Keep if tests reference recent code; compact if switching focus area |
 | Debugging → Next feature | **Yes** | Debug traces pollute unrelated work |
 | Mid-implementation | **No** | Losing file paths, variable names, partial state is costly |
 | After a failed approach | **Yes** | Clear dead-end reasoning before trying a new approach |
 ## Before Compacting
 Save anything you cannot reconstruct cheaply:
 - Write the plan to a task list or file before compacting after research
 - Commit or stash work-in-progress code before compacting after debugging
 - Note key file paths in the next prompt if they will be needed again
 ## What Survives Compaction
 | Survives | Lost |
 |----------|------|
 | CLAUDE.md / AGENTS.md instructions | Intermediate reasoning |
 | Task list (TodoWrite) | File contents read in session |
 | Files on disk | Tool call history |
 | Git state | Verbally stated preferences |
 | Memory files | Multi-step conversation context |
 ## Compaction Discipline
 - Do not compact to "clean up" during active multi-file implementation
 - Do compact when starting a conceptually distinct task in the same session
 - Use a summary prompt with `/compact`: `/compact — now implementing auth middleware per plan`
 - After compaction, re-read the task list or plan file to restore intent
 ## Token Awareness
 - Each loaded skill adds 1–5K tokens to context
 - Load skills on demand, not at session start
 - CLAUDE.md / AGENTS.md are always loaded; keep them lean
 - Duplicate instructions (root config + plugin skill) are the most common waste

+18

skills/task-intake/SKILL.md

		---
		name: task-intake
		description: "Use at the beginning of a new task. Ensures you fully understand the requirements, boundaries, and acceptance criteria before writing code."
		---

		# Task Intake Protocol

		Never start implementing blindly. When you receive a new task, you must force clarification of boundaries and expected outcomes.

		## Intake Checklist

		1. What is the goal? Summarize the user's request in your own words.
		2. What is out of scope? Identify what you are not going to do. If the user asked to fix a button, do not refactor the routing layer.
		3. How will we test it? Define the validation criteria. Will it be a unit test, a manual UI check, or a curl command?
		4. What context is missing? Ask the user for specific files, logs, or environment details if the request is too vague.

		## Anti-Pattern: The Blind Start
		Do not say "I will now fix the bug." and immediately edit files. Instead, use a repo-map or grep to confirm the files exist, then state your understanding of the problem. If the user's instruction is ambiguous, explicitly pause and ask them a clarifying question.

+18

skills/test-preserving-refactor/SKILL.md

		---
		name: test-preserving-refactor
		description: "Use to restructure code while guaranteeing that all existing tests continue to pass."
		---

		# Test-Preserving Refactor

		Refactoring is only safe if it is backed by tests.

		## The Protocol

		1. Run Tests First: Before touching any code, run the tests covering the target area. They MUST be green. If they are red, stop and fix the tests (or the code) first.
		2. Small Steps: Make one structural change at a time (e.g., extract a method).
		3. Run Tests Immediately: Run the tests immediately after the single structural change.
		4. Revert on Red: If the tests fail, you made a mistake. Revert the change (`git checkout` or `ctrl+z`) and try a different approach. Do not attempt to "fix" the refactor while tests are failing.
		5. Commit: Once the small change is green, consider it a safe checkpoint.

		This strict Red/Green/Refactor cycle prevents you from getting trapped in an uncompilable state.

+22

skills/training-pipeline-debugging/SKILL.md

		---
		name: training-pipeline-debugging
		description: "Diagnose NaN losses, out-of-memory errors, and shape mismatches in deep learning or ML pipelines."
		---

		# Training Pipeline Debugging

		ML training bugs are often silent mathematical errors rather than explicit code crashes.

		## Debugging Protocol

		1. NaN Losses: If loss goes to NaN, check:
		- Learning rate too high?
		- Missing data (NaNs in input)?
		- Log/Exp/Divide by zero in custom loss functions?
		- Exploding gradients (clip gradients)?
		2. OOM (Out of Memory):
		- Reduce batch size.
		- Check for memory leaks in the training loop (e.g., accumulating history across epochs without `.detach()`).
		3. Shape Mismatches:
		- Add temporary print statements or assertions asserting `tensor.shape` before matrix multiplications or loss calculations.
		4. The Overfit Test: The ultimate test of a pipeline is fitting a single batch. If the model cannot achieve near 0 loss on a single batch of 10 examples, the pipeline is fundamentally broken. Do not debug full runs until the single-batch test passes.

+77

skills/tri-model-review/SKILL.md

		---
		name: tri-model-review
		description: Multi-model orchestration — route to two external advisors, then synthesize
		level: 5
		---

		# Tri-Model Review

		Tri-model review routes through two external advisor CLIs, then synthesizes both outputs into one answer.

		Use this when you want parallel external perspectives.

		## When to Use

		- Backend/analysis + frontend/UI work in one request
		- Code review from multiple perspectives (architecture + design/UX)
		- Cross-validation where different models may disagree
		- Fast advisor-style parallel input without full team runtime orchestration

		## Requirements

		- Ensure you have configured the appropriate `apx ask-*` wrappers.
		- If either wrapper is unavailable, continue with whichever provider is available and note the limitation.

		## How It Works

		```text
		1. Decompose the request into two advisor prompts:
		- Analysis/architecture/backend prompt
		- UX/design/docs/alternatives prompt

		2. Run both advisors via the canonical wrappers:
		- apx ask-codex "<prompt>"
		- apx ask-gemini "<prompt>"

		3. Synthesize both outputs into one final response
		```

		## Execution Protocol

		When invoked, follow this workflow:

		### 1. Decompose Request
		Split the user request into:

		- Architecture prompt: correctness, backend, risks, test strategy
		- UX prompt: content clarity, alternatives, edge-case usability, docs polish
		- Synthesis plan: how to reconcile conflicts

		### 2. Invoke advisors via Bash

		Run both advisors via the Bash tool:

		```bash
		apx ask-codex "<architecture prompt>"
		apx ask-gemini "<UX prompt>"
		```

		### 3. Synthesize

		Return one unified answer with:

		- Agreed recommendations
		- Conflicting recommendations (explicitly called out)
		- Chosen final direction + rationale
		- Action checklist

		## Fallbacks

		If one provider is unavailable:

		- Continue with available provider + synthesis
		- Clearly note missing perspective and risk

		If both unavailable:

		- Fall back to a single-model answer and state external advisors were unavailable.

+1

-1

package.json

		{
		"name": "agent-powerups",
		"version": "0.5.0",
		"version": "0.5.1",
		"description": "Local-first CLI for browsing, validating, running, and explicitly writing agent powerups.",
		@@ -5,0 +5,0 @@ "license": "Apache-2.0",

+42

-2

README.md

		@@ -252,5 +252,10 @@ <p align="center">
		- `agent-harness-design`
		- `agent-readable-docs`
		- `agent-runtime-patterns`
		- `agent-session-forensics`
		- `ai-regression-testing`
		- `ai-slop-cleaner`
		- `api-doc-review`
		- `architecture-decision-records`
		- `architecture-simplification`
		- `ask-claude`
		@@ -260,2 +265,3 @@ - `ask-codex`
		- `autonomous-delivery-pipeline`
		- `baseline-comparison-review`
		- `bigquery-cost-audit`
		@@ -266,27 +272,51 @@ - `brainstorming`
		- `build-fix-minimal-diff`
		- `canonical-advisor-routing`
		- `change-impact-check`
		- `changelog-generator`
		- `ci-failure-readout`
		- `codebase-migration-batches`
		- `context-compression`
		- `context-docs`
		- `context-minimization`
		- `context-retrieval-loop`
		- `data-quality`
		- `dataset-split-review`
		- `dbt-incremental-strategy-audit`
		- `dbt-preflight`
		- `dbt-strategy`
		- `dead-code-removal`
		- `defuddle`
		- `dependency-cleanup`
		- `deploy-pipeline-runbook`
		- `dispatching-parallel-agents`
		- `doc-consistency-check`
		- `environment-doctor`
		- `experiment-tracking-review`
		- `failure-triage`
		- `filesystem-mcp-guardrails`
		- `finishing-a-development-branch`
		- `flaky-test-investigation`
		- `gh-address-comments`
		- `github-ci-failure-triage`
		- `graphify`
		- `handoff-discipline`
		- `handoff-documentation`
		- `hard-won-skill-extractor`
		- `incident-readout`
		- `incremental-migration`
		- `json-canvas`
		- `local-rag-mcp`
		- `log-driven-diagnosis`
		- `managed-codebase-context`
		- `markitdown-file-intake`
		- `mcp-server-builder`
		- `memory-build-workflow`
		- `memory-optimization-workflow`
		- `memory-query-workflow`
		- `metric-impact-analyzer`
		- `minimal-reproduction`
		- `ml-leakage-check`
		- `model-evaluation-reporting`
		- `model-routing`
		- `naming-and-structure-cleanup`
		- `no-fluff`
		@@ -297,5 +327,8 @@ - `parallel-execution-engine`
		- `pr-triage`
		- `pre-release-verification`
		- `prompt-evaluation-runner`
		- `readme-hardening`
		- `receiving-code-review`
		- `red-team-eval-authoring`
		- `regression-bisecting`
		- `relay-claude`
		@@ -306,14 +339,22 @@ - `relay-codex`
		- `repo-map`
		- `reproducible-training-runs`
		- `requesting-code-review`
		- `requirements-clarifier`
		- `review-comment-style-mining`
		- `risk-based-review`
		- `safe-refactor`
		- `search-before-building`
		- `semantic-layer-change-review`
		- `skill-authoring-guide`
		- `skill-evaluation-workbench`
		- `sql-business-logic-review`
		- `strategic-context-compaction`
		- `structured-code-search-mcp`
		- `subagent-team-orchestration`
		- `systematic-debugging`
		- `task-intake`
		- `test-driven-development`
		- `test-preserving-refactor`
		- `training-pipeline-debugging`
		- `tri-model-review`
		- `using-git-worktrees`
		@@ -323,3 +364,2 @@ - `using-powerups`
		- `webapp-visual-testing`
		- `worktree-session-manager`
		- `writing-plans`
		@@ -613,2 +653,2 @@ - `writing-skills`

		Roadmap: [`docs/roadmap.md`](./docs/roadmap.md)
		Roadmap: [`roadmap.md`](./docs/roadmap.md)

catalog.json

Sorry, the diff of this file is too big to display

		---
		name: flaky-test-investigation
		description: "Use to diagnose tests that pass and fail intermittently without code changes."
		---

		# Flaky Test Investigation

		Flaky tests erode trust in CI. Do not just re-run them and hope for the best.

		## Investigation Protocol

		1. Isolate the Test: Run the specific failing test by itself. If it passes, the flake is likely an order dependency or state leakage from a previous test.
		2. Stress Test: Run the test in a tight loop (e.g., `for i in {1..100}; do npm test -- -t "My Test"; done`).
		3. Check for Common Vectors:
		- Time: Does the test rely on `Date.now()` or `setTimeout`? Mock the clock.
		- Async/Promises: Are we asserting before a background task finishes? Ensure proper `await` or `waitFor` usage.
		- Shared State: Are we reusing database records, global singletons, or mutated variables between runs? Ensure clean teardowns in `afterEach`.
		- Randomness: Does the test rely on random IDs or sorts? Force deterministic seeds or sort orders.
		4. Prove the Fix: Do not just guess. The fix must be verified by running the stress test loop again and achieving a 100% pass rate.

agent-powerups - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics