
Research
TeamPCP Compromises Telnyx Python SDK to Deliver Credential-Stealing Malware
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.
A modular runtime and orchestration system for AI agents - works with Claude Code, OpenCode, and Codex CLI
A modular runtime and orchestration system for AI agents.
19 plugins · 47 agents · 40 skills (across all repos) · 30k lines of lib code · 3,583 tests · 5 platforms
Plugins distributed as standalone repos under agent-sh org — agentsys is the marketplace & installer
Commands · Installation · Website · Discussions
Built for Claude Code · Codex CLI · OpenCode · Cursor · Kiro
New skills, agents, and integrations ship constantly. Follow for real-time updates:
AI models can write code. That's not the hard part anymore. The hard part is everything around it — task selection, branch management, code review, artifact cleanup, CI, PR comments, deployment. AgentSys is the runtime that orchestrates agents to handle all of it — structured pipelines, gated phases, specialized agents, and persistent state that survives session boundaries.
Building custom skills, agents, hooks, or MCP tools? agnix is the CLI + LSP linter that catches config errors before they fail silently - real-time IDE validation, auto suggestions, auto-fix, and 342 rules for Claude Code, Codex, OpenCode, Cursor, Kiro, Copilot, Gemini CLI, Cline, Windsurf, Roo Code, Amp, and more.
An agent orchestration system — 19 plugins, 47 agents, and 40 skills that compose into structured pipelines for software development. Each plugin lives in its own standalone repo under the agent-sh org. agentsys is the marketplace and installer that ties them together.
Each agent has a single responsibility, a specific model assignment, and defined inputs/outputs. Pipelines enforce phase gates so agents can't skip steps. State persists across sessions so work survives interruptions.
The system runs on Claude Code, OpenCode, Codex CLI, Cursor, and Kiro. Install via the marketplace or the npm installer, and the plugins are fetched automatically from their repos.
Code does code work. AI does AI work.
Certainty levels exist because not all findings are equal:
| Level | Meaning | Action |
|---|---|---|
| HIGH | Definitely a problem | Safe to auto-fix |
| MEDIUM | Probably a problem | Needs context |
| LOW | Might be a problem | Needs human judgment |
This came from testing on 1,000+ repositories.
Structured prompts and enriched context do more for output quality than model tier. Benchmarked March 2026 on real tasks (/can-i-help and /onboard against glide-mq), measured with claude -p --output-format json. Models: Claude Opus 4 and Claude Sonnet 4.
Same task, same repo, same prompt ("I want to improve docs"):
| Configuration | Cost | Output tokens | Result quality |
|---|---|---|---|
| Opus, no agentsys | $1.10 | 2,841 | Generic recommendations, no project-specific context |
| Opus + agentsys | $1.95 | 5,879 | Specific recommendations with effort estimates, convention awareness, breaking change detection |
| Sonnet + agentsys | $0.66 | 6,084 | Comparable to Opus + agentsys: specific, actionable, project-aware |
Sonnet + agentsys produced more output with higher specificity than raw Opus - at 40% lower cost.
Once the pipeline provides structured prompts, enriched repo-intel data, and phase-gated workflows, the model does less heavy lifting. The gap between Sonnet and Opus narrows:
| Plugin | Opus | Sonnet | Savings |
|---|---|---|---|
| /onboard | $1.10 | $0.30 | 73% |
| /can-i-help | $1.34 | $0.23 | 83% |
Both models reached the same outcome quality - Sonnet just costs less to get there. The structured pipeline captures most of the gains that would otherwise require a more expensive model.
| Scenario | Model cost | Quality |
|---|---|---|
| Without agentsys | Need Opus for good results | Depends on model capability |
| With agentsys | Sonnet is sufficient | Pipeline handles the structure, model handles judgment |
The investment shifts from model spend to pipeline design. Better prompts, richer context, enforced phases - these compound in ways that model upgrades alone don't.
| Command | What it does |
|---|---|
/next-task | Task workflow: discovery, implementation, PR, merge |
/prepare-delivery | Pre-ship quality gates: deslop, review, validation, docs sync |
/gate-and-ship | Quality gates then ship (/prepare-delivery + /ship) |
/agnix | Lint agent configurations (342 rules) |
/ship | PR creation, CI monitoring, merge |
/deslop | Clean AI slop patterns |
/perf | Performance investigation with baselines and profiling |
/drift-detect | Compare plan vs implementation |
/audit-project | Multi-agent iterative code review |
/enhance | Plugin, agent, and prompt analyzers |
/repo-intel | Unified static analysis - git history, AST symbols, project metadata |
/sync-docs | Sync documentation with code changes |
/learn | Research topics, create learning guides |
/consult | Cross-tool AI consultation |
/debate | Structured debate between AI tools |
/web-ctl | Browser automation for AI agents |
/release | Versioned release with ecosystem detection |
/skillers | Workflow pattern learning and automation |
/onboard | Codebase orientation for newcomers |
/can-i-help | Match contributor skills to project needs |
Each command works standalone. Together, they compose into end-to-end pipelines.
40 skills included across the plugins:
| Category | Skills |
|---|---|
| Workflow | discover-tasks, prepare-delivery, check-test-coverage, orchestrate-review, validate-delivery |
| Message Queues | glide-mq-migrate-bee, glide-mq-migrate-bullmq, glide-mq |
| Enhancement | enhance-agent-prompts, enhance-claude-memory, enhance-cross-file, enhance-docs, enhance-hooks, enhance-orchestrator, enhance-plugins, enhance-prompts, enhance-skills |
| Performance | baseline, benchmark, code-paths, investigation-logger, perf-analyzer, profile, theory-gatherer, theory-tester |
| Cleanup | deslop, sync-docs |
| Code Review | audit-project |
| AI Collaboration | consult, debate, learn, recommend, skillers-compact |
| Onboarding | can-i-help, onboard |
| Web | web-auth, web-browse |
| Release | release |
| Analysis | drift-analysis, repo-intel |
External skill plugins (standalone repos, installed separately):
| Category | Skills | Plugin |
|---|---|---|
| Message Queues | glide-mq, glide-mq-migrate-bullmq, glide-mq-migrate-bee | agent-sh/glidemq |
Skills are the reusable implementation units. Agents invoke skills; commands orchestrate agents. When you install a plugin, its skills become available to all agents in that session.
| Section | What's there |
|---|---|
| The Approach | Why it's built this way |
| Benchmarks | Sonnet + agentsys vs raw Opus |
| Commands | All 20 commands overview |
| Skills | 40 skills across plugins |
| Skill-Only Plugins | glide-mq and other non-command plugins |
| Command Details | Deep dive into each command |
| How Commands Work Together | Standalone vs integrated |
| Design Philosophy | The thinking behind the architecture |
| Installation | Get started |
| Research & Testing | What went into building this |
| Documentation | Links to detailed docs |
Plugins that provide skills without a / command. Installed alongside agentsys; skills become available to all agents.
Build message queues, background jobs, and workflow orchestration with glide-mq - high-performance Node.js queue on Valkey/Redis.
| Skill | What it does |
|---|---|
glide-mq | Greenfield queue development - queues, workers, ordering, rate limiting, flows, broadcast, step jobs |
glide-mq-migrate-bullmq | Migrate from BullMQ to glide-mq - API mapping, breaking changes, feature comparison |
glide-mq-migrate-bee | Migrate from Bee-Queue to glide-mq - API mapping, pattern conversion |
Key features: per-key ordering, group concurrency, runtime group rate limiting (job.rateLimitGroup()), token bucket, DAG workflows, broadcast pub/sub, step jobs, deduplication, serverless producers.
Skill plugin → | glide-mq docs → | npm →
Purpose: Complete task-to-production automation.
What happens when you run it:
Phase 9 uses the orchestrate-review skill to spawn parallel reviewers (code quality, security, performance, test coverage) plus conditional specialists.
Agents involved:
| Agent | Model | Role |
|---|---|---|
| task-discoverer | sonnet | Finds and ranks tasks from your source |
| worktree-manager | haiku | Creates git worktrees and branches |
| exploration-agent | sonnet | Deep codebase analysis before planning |
| planning-agent | opus | Designs step-by-step implementation plan |
| implementation-agent | opus | Writes the actual code |
| prepare-delivery:test-coverage-checker | sonnet | Validates tests exist and are meaningful |
| prepare-delivery:delivery-validator | sonnet | Final checks before shipping |
| ci-monitor | haiku | Watches CI status |
| ci-fixer | sonnet | Fixes CI failures and review comments |
| simple-fixer | haiku | Executes mechanical edits |
Cross-plugin agent:
| Agent | Plugin | Role |
|---|---|---|
| deslop-agent | deslop | Removes AI artifacts before review |
| sync-docs-agent | sync-docs | Updates documentation |
Usage:
/next-task # Start new workflow
/next-task --resume # Resume interrupted workflow
/next-task --status # Check current state
/next-task --abort # Cancel and cleanup
Purpose: Run all pre-ship quality gates without shipping. Use after completing implementation manually or outside /next-task.
What it runs (in order):
/prepare-delivery # Run all quality gates
/prepare-delivery --skip-review # Skip review loop
/prepare-delivery --skip-docs # Skip docs sync
/prepare-delivery --base=develop # Against a specific base branch
Does NOT create PRs or push - use /ship or /gate-and-ship after.
Purpose: Quality gates then ship in one command. Chains /prepare-delivery then /ship.
/gate-and-ship # Full: quality gates + ship
/gate-and-ship --skip-review # Skip review, still ship
/gate-and-ship --base=develop # Against a specific base branch
Composability:
/gate-and-ship = /prepare-delivery + /ship
Each piece runs independently - use /prepare-delivery alone to review before deciding to ship, or /ship alone if already validated.
Purpose: Lint agent configurations before they break your workflow. The first dedicated linter for AI agent configs.
agnix is a standalone open-source project that provides the validation engine. This plugin integrates it into your workflow.
The problem it solves:
Agent configurations are code. They affect behavior, security, and reliability. But unlike application code, they have no linting. You find out your SKILL.md is malformed when the agent fails. You discover your hooks have security issues when they're exploited. You realize your CLAUDE.md has conflicting rules when the AI behaves unexpectedly.
agnix catches these issues before they cause problems.
What it validates:
| Category | What It Checks |
|---|---|
| Structure | Required fields, valid YAML/JSON, proper frontmatter |
| Security | Prompt injection vectors, overpermissive tools, exposed secrets |
| Consistency | Conflicting rules, duplicate definitions, broken references |
| Best Practices | Tool restrictions, model selection, trigger phrase quality |
| Cross-Platform | Compatibility across Claude Code, Codex, OpenCode, Cursor, Kiro, Copilot, Gemini CLI, Cline, Windsurf, Roo Code, Amp, and more |
342 validation rules (102 auto-fixable) derived from:
Supported files:
| File Type | Examples |
|---|---|
| Skills | SKILL.md, */SKILL.md |
| Memory | CLAUDE.md, AGENTS.md, .github/CLAUDE.md |
| Hooks | .claude/settings.json, hooks configuration |
| MCP | *.mcp.json, MCP server configs |
| Cursor | .cursor/rules/*.mdc, .cursorrules |
| Copilot | .github/copilot-instructions.md |
| Kiro | .kiro/steering/**/*.md, .kiro/agents/*.json, .kiro/hooks/*.kiro.hook, POWER.md |
| Windsurf | .windsurf/rules/**/*.md, .windsurf/workflows/**/*.md, .windsurfrules |
| Roo Code | .roo/rules/*.md, .roo/rules-{mode}/*.md, .roomodes, .rooignore, .roorules |
| Gemini CLI | GEMINI.md, .gemini/settings.json, gemini-extension.json |
| OpenCode | opencode.json |
| Amp | .agents/checks/**/*.md, .amp/settings.json |
CI/CD Integration:
agnix outputs SARIF format for GitHub Code Scanning. Add it to your workflow:
- name: Lint agent configs
run: agnix --format sarif > results.sarif
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
Usage:
/agnix # Validate current project
/agnix --fix # Auto-fix fixable issues
/agnix --strict # Treat warnings as errors
/agnix --target claude-code # Only Claude Code rules
/agnix --format sarif # Output for GitHub Code Scanning
Agent: agnix-agent (sonnet model)
External tool: Requires agnix CLI
npm install -g agnix # Install via npm
# or
cargo install agnix-cli # Install via Cargo
# or
brew install agnix # Install via Homebrew (macOS)
Why use agnix:
Purpose: Takes your current branch from "ready to commit" to "merged PR."
What happens when you run it:
Platform Detection:
| Type | Detected |
|---|---|
| CI | GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis |
| Deploy | Railway, Vercel, Netlify, Fly.io, Render |
| Project | Node.js, Python, Rust, Go, Java |
Review Comment Handling:
Every comment gets addressed. No exceptions. The workflow categorizes comments and handles each:
If something can't be fixed, the workflow replies explaining why and resolves the thread.
Usage:
/ship # Full workflow
/ship --dry-run # Preview without executing
/ship --strategy rebase # Use rebase instead of squash
Purpose: Finds AI slop—debug statements, placeholder text, verbose comments, TODOs—and removes it.
How detection works:
Three phases run in sequence:
Phase 1: Regex Patterns (HIGH certainty)
console.log, print(), dbg!(), println!()// TODO, // FIXME, // HACKPhase 2: Multi-Pass Analyzers (MEDIUM certainty)
Phase 3: CLI Tools (LOW certainty, optional)
Languages supported: JavaScript/TypeScript, Python, Rust, Go, Java
Usage:
/deslop # Report only (safe)
/deslop apply # Fix HIGH certainty issues
/deslop apply src/ 10 # Fix 10 issues in src/
Thoroughness levels:
quick - Phase 1 only (fastest)normal - Phase 1 + Phase 2 (default)deep - All phases if tools availablePurpose: Structured performance investigation with baselines, profiling, and evidence-backed decisions.
10-phase methodology (based on recorded real performance investigation sessions):
Agents and skills:
| Component | Role |
|---|---|
| perf-orchestrator | Coordinates all phases |
| perf-theory-gatherer | Generates hypotheses from git history and code |
| perf-theory-tester | Validates hypotheses with controlled experiments |
| perf-analyzer | Synthesizes findings into recommendations |
| perf-code-paths | Maps entrypoints and likely hot paths |
| perf-investigation-logger | Structured evidence logging |
Usage:
/perf # Start new investigation
/perf --resume # Resume previous investigation
Phase flags (advanced):
/perf --phase baseline --command "npm run bench" --version v1.2.0
/perf --phase breaking-point --param-min 1 --param-max 500
/perf --phase constraints --cpu 1 --memory 1GB
/perf --phase hypotheses --hypotheses-file perf-hypotheses.json
/perf --phase optimization --change "reduce allocations"
/perf --phase decision --verdict stop --rationale "no measurable improvement"
Purpose: Compares your documentation and plans to what's actually in the code.
The problem it solves:
Your roadmap says "user authentication: done." But is it actually implemented? Your GitHub issue says "add dark mode." Is it already in the codebase? Plans drift from reality. This command finds the drift.
How it works:
JavaScript collectors gather data (fast, token-efficient)
Single Opus call performs semantic analysis
auth/, login.js, session.ts)Why this approach:
Multi-agent collection wastes tokens on coordination. JavaScript collectors are fast and deterministic. One well-prompted LLM call does the actual analysis. Result: 77% token reduction vs multi-agent approaches.
Tested on 1,000+ repositories before release.
Usage:
/drift-detect # Full analysis
/drift-detect --depth quick # Quick scan
Purpose: Multi-agent code review that iterates until issues are resolved.
What happens when you run it:
Up to 10 specialized role-based agents run based on your project:
| Agent | When Active | Focus Area |
|---|---|---|
| code-quality-reviewer | Always | Code quality, error handling |
| security-expert | Always | Vulnerabilities, auth, secrets |
| performance-engineer | Always | N+1 queries, memory, blocking ops |
| test-quality-guardian | Always | Coverage, edge cases, mocking |
| architecture-reviewer | If 50+ files | Modularity, patterns, SOLID |
| database-specialist | If DB detected | Queries, indexes, transactions |
| api-designer | If API detected | REST, errors, pagination |
| frontend-specialist | If frontend detected | Components, state, UX |
| backend-specialist | If backend detected | Services, domain logic |
| devops-reviewer | If CI/CD detected | Pipelines, configs, secrets |
Findings are collected and categorized by severity (critical/high/medium/low). All non-false-positive issues get fixed automatically. The loop repeats until no open issues remain.
Usage:
/audit-project # Full review
/audit-project --quick # Single pass
/audit-project --resume # Resume from queue file
/audit-project --domain security # Security focus only
/audit-project --recent # Only recent changes
Purpose: Analyzes your prompts, plugins, agents, docs, hooks, and skills for improvement opportunities.
Seven analyzers run in parallel:
| Analyzer | What it checks |
|---|---|
| plugin-enhancer | Plugin structure, MCP tool definitions, security patterns |
| agent-enhancer | Agent frontmatter, prompt quality |
| claudemd-enhancer | CLAUDE.md/AGENTS.md structure, token efficiency |
| docs-enhancer | Documentation readability, RAG optimization |
| prompt-enhancer | Prompt engineering patterns, clarity, examples |
| hooks-enhancer | Hook frontmatter, structure, safety |
| skills-enhancer | SKILL.md structure, trigger phrases |
Each finding includes:
Auto-learning: Detects obvious false positives (pattern docs, workflow gates) and saves them for future runs. Reduces noise over time without manual suppression files.
Usage:
/enhance # Run all analyzers
/enhance --focus=agent # Just agent prompts
/enhance --apply # Apply HIGH certainty fixes
/enhance --show-suppressed # Show what's being filtered
/enhance --no-learn # Analyze but don't save false positives
Purpose: Unified static analysis - git history, AST symbols, and project metadata in one plugin.
What it provides:
Output is cached at {state-dir}/repo-intel.json and {state-dir}/repo-map.json.
Why it matters:
Tools like /drift-detect, /onboard, /can-i-help, and planners consume this data instead of re-scanning the repo every time. 9 plugins use repo-intel data automatically.
Usage:
/repo-intel init # First-time scan
/repo-intel update # Incremental update
/repo-intel query hotspots # Most active files
/repo-intel query ownership src/ # Who owns a path
/repo-intel query bus-factor # Knowledge risk
Backed by agent-analyzer Rust binary.
Purpose: Sync documentation with actual code changes—find outdated refs, update CHANGELOG, flag stale examples.
The problem it solves:
You refactor auth.js into auth/index.js. Your README still says import from './auth'. You rename a function. Three docs still reference the old name. You ship a feature. CHANGELOG doesn't mention it. Documentation drifts from code. This command finds the drift.
What it detects:
| Category | Examples |
|---|---|
| Broken references | Imports to moved/renamed files, deleted exports |
| Version mismatches | Doc says v2.0, package.json says v2.1 |
| Stale code examples | Import paths that no longer exist |
| Missing CHANGELOG | feat: and fix: commits without entries |
Auto-fixable vs flagged:
| Auto-fixable (apply mode) | Flagged for review |
|---|---|
| Version number updates | Removed exports referenced in docs |
| CHANGELOG entries for commits | Code examples needing context |
| Function renames |
Usage:
/sync-docs # Check what docs need updates (safe)
/sync-docs apply # Apply safe fixes
/sync-docs report src/ # Check docs related to src/
/sync-docs --all # Full codebase scan
Purpose: Research any topic online and create a comprehensive learning guide with RAG-optimized indexes.
What it does:
Depth levels:
| Depth | Sources | Use Case |
|---|---|---|
| brief | 10 | Quick overview |
| medium | 20 | Default, balanced |
| deep | 40 | Comprehensive |
Output structure:
agent-knowledge/
CLAUDE.md # Master index (updated each run)
AGENTS.md # Index for OpenCode/Codex
recursion.md # Topic-specific guide
resources/
recursion-sources.json # Source metadata with quality scores
Usage:
/learn recursion # Default (20 sources)
/learn react hooks --depth=deep # Comprehensive (40 sources)
/learn kubernetes --depth=brief # Quick overview (10 sources)
/learn python async --no-enhance # Skip enhancement pass
Agent: learn-agent (sonnet model)
Purpose: Get a second opinion from another AI CLI tool without leaving your current session.
What it does:
--continue)Supported tools:
| Tool | Default Model (high) | Reasoning Control |
|---|---|---|
| Claude | claude-opus-4-6 | max-turns |
| Gemini | gemini-3.1-pro-preview | built-in |
| Codex | gpt-5.3-codex | model_reasoning_effort |
| OpenCode | (user-selected or default) | --variant |
| Copilot | (default) | none |
Usage:
/consult "Is this the right approach?" --tool=gemini --effort=high
/consult "Review for performance issues" --tool=codex
/consult "Suggest alternatives" --tool=claude --effort=max
/consult "Continue from where we left off" --continue
/consult "Explain this error" --context=diff --tool=gemini
Agent: consult-agent (sonnet model for orchestration)
Purpose: Stress-test ideas through structured multi-round debate between two AI CLI tools.
What it does:
Usage:
# Natural language
/debate codex vs gemini about microservices vs monolith
/debate with claude and codex about our auth implementation
/debate thoroughly gemini vs codex about database schema design
/debate codex vs gemini 3 rounds about event sourcing
# Explicit flags
/debate "Should we use event sourcing?" --tools=claude,gemini --rounds=3 --effort=high
/debate "Valkey vs PostgreSQL for caching" --tools=codex,opencode
# With codebase context
/debate "Is our current approach correct?" --tools=gemini,codex --context=diff
Options:
| Flag | Description |
|---|---|
--tools=TOOL1,TOOL2 | Proposer and challenger (comma-separated) |
--rounds=N | Number of debate rounds, 1–5 (default: 2) |
--effort=low|medium|high|max | Reasoning depth per tool call |
--context=diff|file=PATH|none | Codebase context passed to both tools |
Agent: debate-orchestrator (opus model for orchestration)
Purpose: Browser automation for AI agents - navigate, authenticate, and interact with web pages.
How it works:
Each invocation is a single Node.js process using Playwright. No daemon, no MCP server. Session state persists via Chrome's userDataDir with AES-256-GCM encrypted storage.
Agent calls skill -> node scripts/web-ctl.js <args> -> Playwright API -> JSON result
Session lifecycle:
session start <name> - Create session (encrypted profile directory)session auth <name> --url <login-url> - Opens headed Chrome for human login (2FA, CAPTCHAs). Polls for success URL/selector, encrypts cookies on completionrun <name> <action> - Headless actions using persisted cookiessession end <name> - CleanupActions:
| Action | Description | Key flag |
|---|---|---|
goto <url> | Navigate to URL | |
snapshot | Get accessibility tree (primary page inspection) | |
click <sel> | Click element | --wait-stable |
click-wait <sel> | Click and wait for DOM + network stability | --timeout <ms> |
type <sel> <text> | Type with human-like delays | |
read <sel> | Read element text content | |
fill <sel> <value> | Clear field and set value | |
wait <sel> | Wait for element to appear | --timeout <ms> |
evaluate <js> | Execute JS in page context | --allow-evaluate |
screenshot | Full-page screenshot | --path <file> |
network | Capture network requests | --filter <pattern> |
checkpoint | Open headed browser for user (CAPTCHAs) | --timeout <sec> |
click-wait waits for network idle + no DOM mutations for 500ms before returning. Cuts SPA interactions from multiple agent turns to one.
Error handling:
All errors return classified codes with actionable recovery suggestions:
| Code | Recovery suggestion |
|---|---|
element_not_found | Snapshot included in response for selector discovery |
timeout | Increase --timeout |
browser_closed | session start <name> |
network_error | Check URL; verify cookies with session status |
no_display | Use --vnc flag |
session_expired | Re-authenticate |
Security: Output sanitization (cookies/tokens redacted), prompt injection defense ([PAGE_CONTENT: ...] delimiters), AES-256-GCM encryption at rest, anti-bot measures (webdriver=false, random delays), read-only agent (no Write/Edit tools).
Selector syntax: role=button[name='Submit'], css=div.class, text=Click here, #id
Usage:
/web-ctl goto https://example.com
/web-ctl auth twitter --url https://x.com/i/flow/login
/web-ctl # describe what you want to do, agent orchestrates it
Install:
agentsys install web-ctl
npm install playwright
npx playwright install chromium
Agent: web-session (sonnet model)
Skills: web-auth (human-in-the-loop auth), web-browse (headless actions)
Versioned release with automatic ecosystem and tooling detection
/release # Patch release (auto-discovers how this repo releases)
/release minor # Minor version bump
/release major --dry-run # Preview what would happen
The release agent discovers how your repo releases before executing:
release: target, npm release script, scripts/release.*Supports 12+ ecosystems: npm, cargo, python, go, maven, gradle, ruby, nuget, dart, hex, packagist, swift.
Agent: release-agent (sonnet model)
Skill: release (generic fallback workflow)
Learn from your workflow patterns and suggest automations
/skillers show # Display current config and knowledge stats
/skillers compact # Analyze recent transcripts, extract patterns
/skillers compact --days=14 # Analyze older transcripts
/skillers recommend # Get automation suggestions from accumulated knowledge
Reads your Claude Code conversation transcripts, identifies recurring patterns (pain points, repeated workflows, wishes), clusters them into weighted themes, and suggests skills, hooks, or agents to automate them.
No per-turn overhead - it reads transcripts that Claude Code already saves.
Agents: skillers-compactor (sonnet), skillers-recommender (opus)
Skills: skillers-compact, recommend
Purpose: Get oriented in any codebase in under 3 minutes.
What happens when you run it:
74% fewer tokens than manual onboarding. Validated on 100 repos across JS/TS, Rust, Go, Python, C/C++, Java, and Deno.
Depth levels:
| Level | Time | Data |
|---|---|---|
| quick | ~2s | Manifest + README + structure |
| normal | ~5s | + CLAUDE.md/AGENTS.md + CI + repo-intel |
| deep | ~15s | + repo-intel AST symbols |
Supported manifests: package.json, Cargo.toml, go.mod, pyproject.toml, deno.json, CMakeLists.txt, meson.build, setup.py, pom.xml, build.gradle. Detects monorepos (npm/pnpm/lerna/Cargo workspaces, Python libs/, Deno workspaces).
Usage:
/onboard # Current repo
/onboard /path/to/repo # Specific repo
/onboard --depth=deep # Include AST data
Agent: onboard-agent (opus model)
Purpose: Match a contributor's skills to specific areas where they can help.
What happens when you run it:
Matching:
| Developer profile | Gets recommended |
|---|---|
| New to stack | Good-first areas with clear patterns |
| Experienced | Hard problems in pain-point areas |
| Test-focused | Test gaps in frequently-changed files |
| Bug-focused | Bugspot files + relevant open issues |
| Docs-focused | Stale documentation with code examples |
Usage:
/can-i-help # Current repo
/can-i-help /path/to/repo # Specific repo
/can-i-help --depth=deep # Include AST data
Agent: can-i-help-agent (opus model)
Standalone use:
/deslop apply # Just clean up your code
/sync-docs # Just check if docs need updates
/prepare-delivery # Run all quality gates (no ship)
/ship # Just ship this branch
/gate-and-ship # Quality gates + ship in one command
/audit-project # Just review the codebase
Composable delivery chain:
/prepare-delivery = quality gates only (deslop, review, validation, docs)
/ship = PR + CI + merge only
/gate-and-ship = /prepare-delivery + /ship
/next-task = full workflow (discovery → implementation → /prepare-delivery → /ship)
Full integrated workflow:
When you run /next-task, it orchestrates everything:
/next-task picks task → explores codebase → plans implementation
↓
implementation-agent writes code
↓
deslop-agent + prepare-delivery:test-coverage-checker + /simplify (parallel)
↓
review loop iterates until approved
↓
prepare-delivery:delivery-validator checks requirements
↓
sync-docs-agent syncs documentation
↓
/ship creates PR → monitors CI → merges
The workflow tracks state so you can resume from any point.
Frontier models write good code. That's solved. What's not solved:
1. One agent, one job, done extremely well
Same principle as good code: single responsibility. The exploration-agent explores. The implementation-agent implements. Phase 9 spawns multiple focused reviewers. No agent tries to do everything. Specialized agents, each with narrow scope and clear success criteria.
2. Pipeline with gates, not a monolith
Same principle as DevOps. Each step must pass before the next begins. Can't push before review. Can't merge before CI passes. Hooks enforce this—agents literally cannot skip phases.
3. Tools do tool work, agents do agent work
If static analysis, regex, or a shell command can do it, don't ask an LLM. Pattern detection uses pre-indexed regex. File discovery uses glob. Platform detection uses file existence checks. The LLM only handles what requires judgment.
4. Agents don't need to know how tools work
The slop detector returns findings with certainty levels. The agent doesn't need to understand the three-phase pipeline, the regex patterns, or the analyzer heuristics. Good tool design means the consumer doesn't need implementation details.
5. Build tools where tools don't exist
Many tasks lack existing tools. JavaScript collectors for drift-detect. Multi-pass analyzers for slop detection. The result: agents receive structured data, not raw problems to figure out.
6. Research-backed prompt engineering
Documented techniques that measurably improve results:
7. Validate plan and results, not every step
Approve the plan. See the results. The middle is automated. One plan approval unlocks autonomous execution through implementation, review, cleanup, and shipping.
8. Right model for the task
Match model capability to task complexity:
Quality compounds. Poor exploration → poor plan → poor implementation → review cycles. Early phases deserve the best model.
9. Persistent state survives sessions
Two JSON files track everything: what task, what phase. Sessions can die and resume. Multiple sessions run in parallel on different tasks using separate worktrees.
10. Delegate everything automatable
Agents don't just write code. They:
If it can be specified, it can be delegated.
11. Orchestrator stays high-level
The main workflow orchestrator doesn't read files, search code, or write implementations. It launches specialized agents and receives their outputs. Keeps the orchestrator's context window available for coordination rather than filled with file contents.
12. Composable, not monolithic
Every command works standalone. /deslop cleans code without needing /next-task. /ship merges PRs without needing the full workflow. Pieces compose together, but each piece is useful on its own.
/plugin marketplace add agent-sh/agentsys
/plugin install next-task@agentsys
/plugin install ship@agentsys
npm install -g agentsys && agentsys
Interactive installer for Claude Code, OpenCode, Codex CLI, Cursor, and Kiro.
# Non-interactive install
agentsys --tool claude # Single tool
agentsys --tool cursor # Cursor (project-scoped skills + commands)
agentsys --tool kiro # Kiro (project-scoped steering + skills + agents)
agentsys --tools "claude,opencode" # Multiple tools
agentsys --development # Dev mode (bypasses marketplace)
Required:
For GitHub workflows:
gh) authenticatedFor GitLab workflows:
glab) authenticatedFor /repo-intel:
For /agnix:
npm install -g agnix, cargo install agnix-cli, or brew install agnix)Local diagnostics (optional):
npm run detect # Platform detection (CI, deploy, project type)
npm run verify # Tool availability + versions
The system is built on research, not guesswork.
Knowledge base (agent-docs/): 8,000 lines of curated documentation from Anthropic, OpenAI, Google, and Microsoft covering:
Testing:
Methodology:
/perf investigation phases based on recorded real performance investigation sessions| Topic | Link |
|---|---|
| Installation | docs/INSTALLATION.md |
| Cross-Platform Setup | docs/CROSS_PLATFORM.md |
| Usage Examples | docs/USAGE.md |
| Architecture | docs/ARCHITECTURE.md |
| Workflow | Link |
|---|---|
| /next-task Flow | docs/workflows/NEXT-TASK.md |
| /ship Flow | docs/workflows/SHIP.md |
| Topic | Link |
|---|---|
| Slop Patterns | docs/reference/SLOP-PATTERNS.md |
| Agent Reference | docs/reference/AGENTS.md |
MIT License | Made by Avi Fenesh
FAQs
A modular runtime and orchestration system for AI agents - works with Claude Code, OpenCode, and Codex CLI
We found that agentsys demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.

Security News
TeamPCP is partnering with ransomware group Vect to turn open source supply chain attacks on tools like Trivy and LiteLLM into large-scale ransomware operations.

Security News
/Research
Widespread GitHub phishing campaign uses fake Visual Studio Code security alerts in Discussions to trick developers into visiting malicious website.