
Research
/Security News
Mini Shai-Hulud Campaign Hits Red Hat Cloud Services npm Packages
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.
darwin-agents
Advanced tools
AI agents that improve themselves. Self-evolving prompts via A/B testing, multi-model critics, safety gates, and pattern detection.
Part of the StudioMeyer MCP Stack — Built in Mallorca 🌴 · ⭐ if you use it
Build AI agent teams that learn from every run.
Self-evolving prompts. A/B tested. Safety-gated.
Quick Start · Agents · How It Works · CLI · FAQ
npm install darwin-agents better-sqlite3
export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY, or use Claude CLI
npx darwin run writer "Explain quantum computing simply"
We have been building tools and systems for ourselves for the past two years. The fact that this repo is small and has few stars is not because it is new. It is because we only just decided to share what we have built. It is not a fresh experiment, it is a long story with a recent commit.
We love building things and sharing them. We do not love social media tactics, growth hacks, or chasing stars and followers. So this repo is small. The code is real, it gets used, issues get answered. Judge for yourself.
If it helps you, sharing, testing, and feedback help us. If it could be better, an issue is more useful. If you build something with it, tell us at hello@studiomeyer.io. That genuinely makes our day.
From a small studio in Palma de Mallorca.
Darwin is a TypeScript framework for building AI agents that automatically optimize their own prompts through experimentation, evaluation, and evolution.
Traditional AI agents use static prompts. You write them once, and they never improve. Darwin changes that:
You run an agent
│
▼
Darwin measures quality
│
▼
Patterns emerge over time
│
▼
New prompt variant generated
│
▼
A/B tested against current
│
▼
Winner becomes default
│
Your agent got better.
You did nothing.
# Install
npm install darwin-agents better-sqlite3
# Set your API key (or use Claude CLI if installed)
export ANTHROPIC_API_KEY=sk-ant-...
# Run your first agent
npx darwin run writer "Explain the CAP theorem in simple terms"
# Enable evolution
npx darwin evolve writer --enable
# Watch it improve over time
npx darwin status writer
import { defineAgent } from 'darwin-agents';
export default defineAgent({
name: 'summarizer',
role: 'Text Summarizer',
description: 'Summarizes text into key points.',
systemPrompt: `Summarize the given text in 3 bullet points.
Be concise. No fluff. Capture the essence.`,
evolution: {
enabled: true,
evaluator: 'critic',
},
});
| Agent | What it does | Needs |
|---|---|---|
| writer | Content writing, explanations, copy | Nothing (zero-config) |
| researcher | Web research with source citations | Tavily API key |
| critic | Evaluates other agents' output (1-10) | Nothing |
| analyst | Code quality analysis | Filesystem access |
Each agent ships with a dedicated multi-critic set that scores the output by the right criteria for that agent type (research = source quality + analytical depth + completeness, analyst = technical accuracy with file:line refs + pattern recognition + recommendation quality, etc.). Custom agents can register their own critic sets — see examples/custom-agent.ts and src/evolution/multi-critic.ts.
Two production patterns Darwin users commonly need but had to build themselves:
examples/closed-loop-feedback.ts — pipe critic findings into your own memory store so the next run sees them. Symmetric (writes both successes and failures), backend-agnostic. Aligned with reflective self-improvement patterns like GEPA (ICLR 2026 Oral) and NousResearch's hermes-agent-self-evolution loop.examples/staleness-monitor.ts — detect agents that stopped firing, or were configured but never fired. Pure classifier + format helpers + ready-made SQL. Wire to your own cron + alert webhook.Closes the loop in three lines. Defaults to zero-config local memory; one config switch points at Mem0 / Zep / Letta / Cognee / a self-hosted MCP server / your own.
Existing self-evolving agent frameworks pick one memory backend and stay
there. Existing MCP-memory servers (Mem0, Zep, Letta, MemPalace,
agentmemory, brainctl) optimize for storage, not for closed-loop critic
feedback. Darwin v0.4.7 is the first MIT-licensed, TypeScript-native,
MCP-native combination of pluggable memory + symmetric self-evolution
(score < 5 → mistake, score ≥ 8 → pattern, mediocre middle band → not
persisted). No vendor lock-in, no cloud required by default, swap-able to
Mem0/Zep/Letta with two config lines.
import { localMemory, remoteMemory } from 'darwin-agents/memory/bridge';
import { runClosedLoopTurn } from 'darwin-agents/memory/closed-loop';
// Default: spawn @studiomeyer/local-memory-mcp via npx — zero cloud, zero keys
const memory = localMemory();
// Or any remote MCP-Memory server
// const memory = remoteMemory('https://your-mcp.example.com/mcp', { authHeader: `Bearer ${KEY}` });
// Or Mem0 with the built-in preset — handles tool names + arg shape for you
// import { mem0Preset } from 'darwin-agents/memory/bridge';
// const memory = remoteMemory('https://api.mem0.ai/mcp', {
// authHeader: `Bearer ${process.env.MEM0_KEY}`,
// ...mem0Preset({ userId: 'darwin-agent', defaultMetadata: { project: 'darwin' } }),
// });
const result = await runClosedLoopTurn(
{ agentName: 'analyst', topic: 'Audit module X' },
{ runner: yourAgentRunner, store: memory },
);
// Run 1 sees zero lessons. Run 2 sees Run 1's findings as context.
| Provider | writeTool | readTool | Notes |
|---|---|---|---|
@studiomeyer/local-memory-mcp (default) | memory_learn | memory_search | zero-config, single SQLite file, no cloud |
| Any self-hosted MCP-Memory server | memory_learn | memory_search | same wire, remote endpoint |
Mem0 MCP (mem0ai/mem0-mcp) | add_memory | search_memories | use ...mem0Preset({ userId }) — handles tool names + arg shape + the memory field in result rows |
| Zep MCP | zep_add | zep_search | optional mapWriteArgs for group_id |
| Letta MCP | archival_insert | archival_search | optional mapReadResult for their envelope |
| Cognee MCP | cognee_add | cognee_search | optional mappers |
Why an MCP-shaped bridge? Because the wire is the same — only tool names and arg shapes vary. One bridge, one reconnect path, one timeout policy. The pattern matches the MCP Bridge proxy paper (arXiv 2504.08999) but stays inside the Darwin process — no extra service to deploy.
Spec-compliant transport. Every HTTP request now carries the
MCP-Protocol-Version: 2025-11-25 header, per MCP spec 2025-11-25
§"HTTP Protocol Versioning". Strict servers MAY return 400 without
it; pre-v0.4.9 only sent the version inside the initialize payload.
Typed errors. Bridge errors are now instances of
McpBridgeProtocolError (JSON-RPC errors from the server, numeric
code) or McpBridgeTransportError (local timeouts, EPIPE, network
resets, child exits — stable string code). Branch on instanceof
to decide retry vs fail-loud without parsing message text.
import {
McpBridgeProtocolError,
McpBridgeTransportError,
} from 'darwin-agents/memory/bridge';
try {
await memory.save(record);
} catch (err) {
if (err instanceof McpBridgeTransportError && err.code === 'timeout') {
// local timeout — safe to retry
} else if (err instanceof McpBridgeProtocolError && err.code === -32602) {
// server said our args are invalid — fail loud, don't retry
}
}
Per-call timeouts. save() and fetchRelevant() accept a
timeoutMs override that beats the bridge-level default, mirroring
the MCP SDK's client.callTool(..., { timeout }). Useful for one-off
slow embedding searches without raising requestTimeoutMs globally.
await memory.fetchRelevant({ query: 'audit', limit: 5, timeoutMs: 30_000 });
await memory.save(record, { timeoutMs: 5_000 });
Mem0 preset. ...mem0Preset({ userId }) wires the right tool
names (add_memory + search_memories) and arg shapes for the
official mem0ai/mem0-mcp server. See the example above.
See examples/memory-darwin-integration.ts
for the full closed-loop pattern: fetch relevant lessons → render them as
prompt context → run the agent → persist critic findings → next run sees
last run's lessons.
These are actual metrics from our development — not synthetic benchmarks.
Agents: writer, researcher, marketing, blog-writer
Total Runs: 300+
Success Rate: 100%
Writer: 7.2/10 (120 runs across tech, webdesign, market)
Marketing: 7.8/10 (70 runs across LinkedIn, Instagram)
Researcher: 7.6/10 (50+ runs, web research with citations)
platform-compliance ████████░░ 8/10
scroll-stopping ████████░░ 8/10
conversion-intent ████████░░ 8/10
| Feature | Darwin | EvoAgentX | DSPy | CrewAI | AutoGen |
|---|---|---|---|---|---|
| Self-evolving prompts | Yes | Yes | Yes (compiler) | No | No |
| A/B testing | Yes | No | No | No | No |
| Safety gate + rollback | Yes | No | No | No | No |
| TypeScript native | Yes | No (Python) | No (Python) | No (Python) | No (Python) |
| Zero-config first agent | Yes | No | No | No | Partial |
| MCP native | Yes | No | No | No | No |
| File-based (no DB required) | Yes | No | No | No | No |
| Built-in Critic agent | Yes | No | No | No | No |
darwin/
├── src/
│ ├── core/ # Agent runner, config, MCP handling
│ ├── memory/ # SQLite storage (experiments, prompts, learnings)
│ ├── evolution/ # Darwin loop, A/B testing, safety gate, patterns
│ ├── agents/ # Built-in agents (writer, researcher, critic, analyst)
│ └── cli/ # CLI commands (run, status, evolve, create)
Darwin uses SQLite by default — zero config, single file, no database to install.
.darwin/
├── darwin.db # All experiments, prompts, learnings
└── reports/ # Markdown reports per run
├── exp-writer-2026-03-08-001.md
└── exp-researcher-2026-03-08-002.md
Want semantic search, cross-agent learnings, and analytics? Upgrade to Darwin Pro for PostgreSQL + pgvector support.
darwin run <agent> "task" # Run an agent
darwin run writer "Hello" --task-type tech # With task categorization
darwin run analyst --path ./src # Analyze a codebase
darwin status # Overview of all agents
darwin status writer # Detailed agent stats + evolution history
darwin evolve writer --enable # Enable self-evolution
darwin evolve writer --reset # Reset to v1
darwin create my-agent # Scaffold a new agent
The free version uses SQLite — great for getting started, handles thousands of experiments.
For teams and production use, Darwin Pro adds:
| Feature | Free (SQLite) | Pro (PostgreSQL) |
|---|---|---|
| Experiment tracking | ✓ | ✓ |
| Prompt versioning | ✓ | ✓ |
| A/B testing | ✓ | ✓ |
| Safety gate | ✓ | ✓ |
| Keyword search | ✓ | ✓ |
| Semantic search (pgvector) | — | ✓ |
| Cross-agent learnings | — | ✓ |
| Analytics & time series | — | ✓ |
| Contradiction detection | — | ✓ |
| Team support (multi-user) | — | ✓ |
| Data export (CSV/JSON) | — | ✓ |
| Learning decay | — | ✓ |
Coming soon. Follow the repo for updates.
What do I need to run Darwin?
Node.js 20+ and one of: Claude CLI (default provider), ANTHROPIC_API_KEY, OPENAI_API_KEY, or a local Ollama instance. For storage, install better-sqlite3 (default) or use PostgreSQL via DARWIN_POSTGRES_URL.
Does Darwin work with models other than Claude?
Yes! Darwin supports multiple providers: Claude CLI (default), Anthropic API, OpenAI/compatible APIs, and Ollama (local). Set provider in your config or use DARWIN_PROVIDER env var.
How many runs until I see improvement? Around 10 runs. First 5 establish a baseline, then Darwin generates a variant and A/B tests it over the next 5 runs.
Is my data safe? Everything stays local. SQLite file on your disk. No telemetry, no cloud, no data leaves your machine.
Can I use this for non-English tasks? Yes. Agents detect language automatically. Darwin's evaluation is language-agnostic.
What if Darwin makes my agent worse? The safety gate prevents regressions. If a new variant scores >20% lower, Darwin automatically rolls back to the last known-good version.
computeDynamicMinRuns() adjusts sample sizes based on variance, but p-values are on the roadmap.PRs welcome. See CONTRIBUTING.md.
StudioMeyer is an AI and design studio based in Palma de Mallorca, working with clients worldwide. We build custom websites and AI infrastructure for small and medium businesses. Production stack on Claude Agent SDK, MCP and n8n, with Sentry, Langfuse and LangGraph for observability and an in-house guard layer.
MIT — use freely, commercially or personally.
FAQs
AI agents that improve themselves. Self-evolving prompts via A/B testing, multi-model critics, safety gates, and pattern detection.
We found that darwin-agents demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.

Research
/Security News
The North Korean malware loader hides in a Packagist-listed package and its GitHub branch to fetch and execute remote code in a likely Contagious Interview-style lure.

Security News
The Rust project is moving toward formal rules on LLM use in contributions after months of internal debate over maintainer burden, code quality, and contributor experience.