@@ -59,6 +59,8 @@ # Project Instructions

		This runs the full pipeline automatically: Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`).
		This runs the full pipeline automatically: Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`).

		The Build Gate (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).

		The Challenge phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps.

		For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
		@@ -65,0 +67,0 @@

+52

-2

config/skills/devlyn:auto-resolve/SKILL.md

		@@ -48,3 +48,3 @@ ---
		Task: [extracted task description]
		Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → [Security] → [Clean] → [Docs]
		Phases: Build → Build Gate → [Browser] → Evaluate → [Fix loop if needed] → Simplify → [Review] → Challenge → [Security] → [Clean] → [Docs]
		Max evaluation rounds: [N]
		@@ -282,2 +282,51 @@ Cross-model evaluation (Codex): [evaluate / review / both / disabled]

		## PHASE 4.5: CHALLENGE

		Every prior phase used checklists, done-criteria, or structured categories. This phase is deliberately different — it's a fresh pair of eyes with no checklist, no prior context, and a skeptical mandate. The subagent hasn't seen the done-criteria, the eval findings, or the review results. It reads the raw diff cold and asks: "would I mass-ship this?"

		This is what catches the things structured reviews miss — subtle logic that technically works but isn't the right approach, assumptions nobody questioned, patterns that are fine but not best-practice, and integration seams that look correct in isolation but feel wrong when you read the whole changeset.

		Spawn a subagent using the Agent tool with `mode: "bypassPermissions"`.

		Agent prompt — pass this to the Agent tool:

		You are a senior engineer doing a final skeptical review before this code ships to production. You have NOT seen any prior reviews, test results, or design docs — read the code cold.

		Run `git diff main` to see all changes. Read every changed file in full (not just the diff hunks — you need surrounding context).

		Your job is NOT to check boxes. Your job is to find the things that would make a staff engineer say "hold on, let's talk about this before we ship." Think about:

		- Would this approach survive a 10x traffic spike? A midnight oncall page? A junior dev maintaining it 6 months from now?
		- Are there assumptions baked in that nobody stated out loud? Hardcoded limits, implicit ordering, missing edge cases in business logic?
		- Is the error handling actually helpful, or does it just prevent crashes while leaving the user confused?
		- Are there simpler, more idiomatic ways to do what this code does? Not "clever" alternatives — genuinely better approaches?
		- Would you mass-confidence approve this PR, or would you leave comments?

		Be brutally honest. Do NOT start with praise. Do NOT soften findings. Every finding must include `file:line` and a concrete fix — not "consider improving" but "change X to Y because Z."

		Write `.devlyn/CHALLENGE-FINDINGS.md`:

		```
		# Challenge Findings
		## Verdict: [PASS / NEEDS WORK]
		## Findings
		### [severity: CRITICAL / HIGH / MEDIUM]
		- `file:line` — what's wrong — Fix: concrete change
		```

		Verdict: PASS only if you would mass-confidently mass-ship this code with your name on it. If you found anything CRITICAL or HIGH, verdict is NEEDS WORK.

		After the agent completes:
		1. Read `.devlyn/CHALLENGE-FINDINGS.md`
		2. Extract the verdict
		3. Branch:
		- `PASS` → continue to PHASE 5
		- `NEEDS WORK` → spawn a fix subagent with `mode: "bypassPermissions"`:

		Read `.devlyn/CHALLENGE-FINDINGS.md` — it contains findings from a fresh skeptical review. Fix every CRITICAL and HIGH finding at the root cause. For MEDIUM findings, fix if straightforward. After fixing, run the test suite to verify nothing broke.

		After the fix agent completes:
		1. Checkpoint: Run `git add -A && git commit -m "chore(pipeline): challenge fixes complete"`
		2. Continue to PHASE 5 (do NOT re-run the challenge — one pass is sufficient to avoid infinite loops)

		## PHASE 5: SECURITY REVIEW (conditional)
		@@ -348,3 +397,3 @@
		1. Clean up temporary files:
		- Delete the `.devlyn/` directory entirely (contains done-criteria.md, BUILD-GATE.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, screenshots/, playwright temp files)
		- Delete the `.devlyn/` directory entirely (contains done-criteria.md, BUILD-GATE.md, EVAL-FINDINGS.md, BROWSER-RESULTS.md, CHALLENGE-FINDINGS.md, screenshots/, playwright temp files)
		- Kill any dev server process still running from browser validation
		@@ -373,2 +422,3 @@
		\| Review (Codex) \| [completed / skipped] \| [Codex-only findings, agreed findings] \|
		\| Challenge \| [PASS / NEEDS WORK] \| [findings count, fixes applied] \|
		\| Security review \| [completed / skipped / auto-skipped] \| [findings or "no security-sensitive changes"] \|
		@@ -375,0 +425,0 @@ \| Clean \| [completed / skipped] \| [items cleaned] \|

+1

-1

package.json

		{
		"name": "devlyn-cli",
		"version": "1.9.1",
		"version": "1.10.0",
		"description": "AI development toolkit for Claude Code — ideate, auto-resolve, and ship with context engineering and agent orchestration",
		@@ -5,0 +5,0 @@ "homepage": "https://github.com/fysoul17/devlyn-cli#readme",

		@@ -59,6 +59,8 @@ # Project Instructions

		This runs the full pipeline automatically: Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Security Review → Clean → Docs. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`).
		This runs the full pipeline automatically: Build → Build Gate → Browser Validate → Evaluate → Fix Loop → Simplify → Review → Challenge → Security Review → Clean → Docs. Each phase runs as a separate subagent with its own context. Communication between phases happens via files (`.devlyn/done-criteria.md`, `.devlyn/BUILD-GATE.md`, `.devlyn/EVAL-FINDINGS.md`, `.devlyn/BROWSER-RESULTS.md`, `.devlyn/CHALLENGE-FINDINGS.md`).

		The Build Gate (Phase 1.4) runs real compilers, typecheckers, and linters — the same commands CI/Docker/production will run. It auto-detects project types (Next.js, Rust, Go, Solidity, Expo, Swift, etc.) and Dockerfiles. This is the primary defense against "tests pass locally, breaks in CI/Docker" class of bugs (type errors in un-tested files, cross-package drift, Dockerfile copy mismatches).

		The Challenge phase (Phase 4.5) is a fresh skeptical review with no checklist — a subagent reads the entire diff cold with zero context from prior phases and asks "would I ship this to production with my name on it?" This catches the subtle issues that structured checklist-driven reviews miss: wrong-but-working approaches, unstated assumptions, non-idiomatic patterns, and integration gaps.

		For web projects, the Browser Validate phase starts the dev server and tests the implemented feature in a real browser — clicking buttons, filling forms, verifying results. If the feature doesn't work, findings feed back into the fix loop.
		@@ -65,0 +67,0 @@

devlyn-cli - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics