labrat
Autonomous research & experiment loops for coding agents.

Point your agent at a problem, go to sleep, wake up to results. Labrat runs an autonomous experiment loop — it modifies code, evaluates the result, keeps what works, throws away what doesn't, and repeats. Works with any measurable optimization target or open-ended research question.
Inspired by karpathy/autoresearch and pi-autoresearch, generalized beyond ML to work on any codebase with any agent that supports the Agent Skills standard (Claude Code, Codex CLI, etc).
Quick Start
claude plugin add github:pawanpaudel93/labrat
npx labrat-agent init
Then tell your agent what to work on:
Optimize my API response time. Target: src/api/handler.ts. Eval: npm run bench. Metric: response_time_ms (minimize). Constraint: memory under 512MB.
The agent creates a labrat/api-perf branch, runs a baseline, then starts iterating autonomously.
How It Works
Labrat adapts to what you give it. Give it a metric and an eval command, and it runs a tight experiment loop. Give it a research question, and it explores. If a metric emerges during exploration, it transitions to the experiment loop automatically.
The experiment loop
When a metric + eval command are available, the agent runs a keep/discard loop:
1. Pick an experiment idea (informed by what worked/failed before)
2. Modify target files
3. Quick checks (lint, typecheck) — fail fast if broken
4. Run benchmark, extract metric
5. If improved: run correctness checks, check constraints
6. Keep (commit) or discard (revert target files)
7. Log everything to .labrat/labrat-results.jsonl
8. Repeat until interrupted or context limit
The exploration loop
When no metric exists, the agent researches via web search, reads docs, tries different approaches, documents findings, and writes up an analysis. If a concrete metric and eval command are identified during exploration, the agent announces the transition and switches to the experiment loop.
The agent also tracks diminishing returns, combines near-misses, avoids repeating dead ends, and creates checkpoints at notable improvements.
Features
- Adapts automatically — give it a metric and it experiments, give it a question and it explores
- Domain-agnostic — works on any codebase, not just ML
- Cross-agent — Claude Code, Codex CLI, or any Agent Skills tool
- Survives context resets — session state persisted in
.labrat/ files, agent compacts and continues
- Clean git history — every commit on the research branch is a successful experiment
Usage Examples
Metric-driven optimization:
Optimize my API response time. Target: src/api/handler.ts. Eval: npm run bench. Metric: response_time_ms (minimize).
Open-ended research:
Research the best approach for real-time sync in our app. Look at CRDTs, OT, and simple polling. Target: src/sync/.
Starting open, then narrowing:
Investigate why our bundle size grew 40%. Target: webpack.config.js, src/index.ts. I think there's a metric here but I'm not sure what to measure yet.
Session Files
Everything lives in .labrat/ on the research branch (gitignored on main):
labrat-config.json | Session config — targets, metric, constraints |
labrat-results.jsonl | Every experiment logged (kept, discarded, or crashed) |
labrat-journal.md | Agent's running notes — strategy, dead ends, next ideas |
labrat-run.sh | Wraps your eval command with timeout and structured output |
labrat-checks.sh | Correctness checks (tests, lint) — optional |
labrat-report.md | Findings report generated at session end |
These files are plain text and agent-agnostic — you can start a session in Claude Code and pick it up in Codex CLI.
Commands
/labrat:start | Set up and run a research session |
/labrat:status | Check progress (experiments, best result, improvement %) |
/labrat:report | Generate a findings report |
Installation Options
npx labrat-agent init
npx labrat-agent init claude
npx labrat-agent init codex
npx labrat-agent init codex-project
npx labrat-agent uninstall
Or manually: clone this repo, copy each skill folder (skills/labrat/start/, report/, status/) to your skills directory as labrat-start/, labrat-report/, labrat-status/, and copy CLAUDE.md (or AGENTS.md for Codex) to your project root.
Inspired By
Contributing
Issues and PRs welcome. Please open an issue first for larger changes.
License
MIT