devlyn-cli
Advanced tools
+3
-1
@@ -59,6 +59,8 @@ # Project Instructions | ||
| This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`). | ||
| This runs the full pipeline automatically: **Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs**. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`). | ||
| The **Build Gate** (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches). | ||
| The **Challenge** phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps. | ||
| For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop. | ||
@@ -65,0 +67,0 @@ |
@@ -48,3 +48,3 @@ --- | ||
| Task: [extracted task description] | ||
| Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs] | ||
| Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → Challenge → [Security] → [Clean] → [Docs] | ||
| Max evaluation rounds: [N] | ||
@@ -282,2 +282,51 @@ Cross-model evaluation (Codex): [evaluate / review / both / disabled] | ||
| ## PHASE 4.5: CHALLENGE | ||
| Every prior phase used checklists, done-criteria, or structured categories. This phase is deliberately different — it's a fresh pair of eyes with no checklist, no prior context, and a skeptical mandate. The subagent hasn't seen the done-criteria, the eval findings, or the review results. It reads the raw diff cold and asks: "would I mass-ship this?" | ||
| This is what catches the things structured reviews miss — subtle logic that technically works but isn't the right approach, assumptions nobody questioned, patterns that are fine but not best-practice, and integration seams that look correct in isolation but feel wrong when you read the whole changeset. | ||
| Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`. | ||
| Agent prompt — pass this to the Agent tool: | ||
| You are a senior engineer doing a final skeptical review before this code ships to production. You have NOT seen any prior reviews, test results, or design docs — read the code cold. | ||
| Run `git diff main` to see all changes. Read every changed file in full (not just the diff hunks — you need surrounding context). | ||
| Your job is NOT to check boxes. Your job is to find the things that would make a staff engineer say "hold on, let's talk about this before we ship." Think about: | ||
| - Would this approach survive a 10x traffic spike? A midnight oncall page? A junior dev maintaining it 6 months from now? | ||
| - Are there assumptions baked in that nobody stated out loud? Hardcoded limits, implicit ordering, missing edge cases in business logic? | ||
| - Is the error handling actually helpful, or does it just prevent crashes while leaving the user confused? | ||
| - Are there simpler, more idiomatic ways to do what this code does? Not "clever" alternatives — genuinely better approaches? | ||
| - Would you mass-confidence approve this PR, or would you leave comments? | ||
| Be brutally honest. Do NOT start with praise. Do NOT soften findings. Every finding must include `file:line` and a concrete fix — not "consider improving" but "change X to Y because Z." | ||
| Write `.devlyn/CHALLENGE-FINDINGS.md`: | ||
| ``` | ||
| # Challenge Findings | ||
| ## Verdict: [PASS / NEEDS WORK] | ||
| ## Findings | ||
| ### [severity: CRITICAL / HIGH / MEDIUM] | ||
| - `file:line` — what's wrong — Fix: concrete change | ||
| ``` | ||
| Verdict: PASS only if you would mass-confidently mass-ship this code with your name on it. If you found anything CRITICAL or HIGH, verdict is NEEDS WORK. | ||
| **After the agent completes**: | ||
| 1. Read `.devlyn/CHALLENGE-FINDINGS.md` | ||
| 2. Extract the verdict | ||
| 3. Branch: | ||
| - `PASS` → continue to PHASE 5 | ||
| - `NEEDS WORK` → spawn a fix subagent with `mode: "bypassPermissions"`: | ||
| Read `.devlyn/CHALLENGE-FINDINGS.md` — it contains findings from a fresh skeptical review. Fix every CRITICAL and HIGH finding at the root cause. For MEDIUM findings, fix if straightforward. After fixing, run the test suite to verify nothing broke. | ||
| After the fix agent completes: | ||
| 1. **Checkpoint**: Run `git add -A && git commit -m "chore(pipeline): challenge fixes complete"` | ||
| 2. Continue to PHASE 5 (do NOT re-run the challenge — one pass is sufficient to avoid infinite loops) | ||
| ## PHASE 5: SECURITY REVIEW (conditional) | ||
@@ -348,3 +397,3 @@ | ||
| 1. Clean up temporary files: | ||
| - Delete the `.devlyn/` directory entirely (contains done-criteria.md, BUILD-GATE.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, screenshots/, playwright temp files) | ||
| - Delete the `.devlyn/` directory entirely (contains done-criteria.md, BUILD-GATE.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, CHALLENGE-FINDINGS.md, screenshots/, playwright temp files) | ||
| - Kill any dev server process still running from browser validation | ||
@@ -373,2 +422,3 @@ | ||
| | Review (Codex) | [completed / skipped] | [Codex-only findings, agreed findings] | | ||
| | Challenge | [PASS / NEEDS WORK] | [findings count, fixes applied] | | ||
| | Security review | [completed / skipped / auto-skipped] | [findings or "no security-sensitive changes"] | | ||
@@ -375,0 +425,0 @@ | Clean | [completed / skipped] | [items cleaned] | |
+1
-1
| { | ||
| "name": "devlyn-cli", | ||
| "version": "1.9.1", | ||
| "version": "1.10.0", | ||
| "description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration", | ||
@@ -5,0 +5,0 @@ "homepage": "https://github.com/fysoul17/devlyn-cli#readme", |
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
Shell access
Supply chain riskThis module accesses the system shell. Accessing the system shell increases the risk of executing arbitrary code.
Found 1 instance in 1 package
Filesystem access
Supply chain riskAccesses the file system, and could potentially read sensitive data.
Found 1 instance in 1 package
891419
0.42%