Stress-test your decisions before you commit. An MCP server that runs adversarial AI debates between frontier models, grounded in live web search.

Most AI tools optimize for consensus. Debate MCP optimizes for finding where your plan breaks.

How It Works

You describe your plan
        |
        v
  [Web Search] -- gathers current facts, laws, regulations
        |
        v
  +-----------+          +-----------+
  |  SKEPTIC  |          | STEELMAN  |
  |  (GPT)    |          | (Gemini)  |
  |           |          |           |
  | Attacks   |          | Finds the |
  | your plan |          | strongest |
  | ruthlessly|          | version,  |
  |           |          | then      |
  |           |          | stress-   |
  |           |          | tests it  |
  +-----------+          +-----------+
        |    Round 2: they     |
        |    read each other   |
        |    (anonymized) and  |
        +--- argue back -------+
                  |
                  v
        [Structured synthesis]
        Recommendation + Crux +
        What Would Falsify +
        Unresolved disagreements

Quick Start

1. Install

npx debate-mcp

2. Add to Claude Code

claude mcp add debate npx debate-mcp \
  -e OPENAI_API_KEY=sk-... \
  -e GEMINI_API_KEY=AI...

3. Use it

Just tell Claude: "debate this", "what am I missing", "stress-test this plan", or "is this the right call".

[!TIP] You can also trigger it with domain and current_leaning for targeted debates: "Debate this as a tax attorney. I'm leaning toward electing S-Corp."

What Makes This Different

Feature	Why it matters
Asymmetric roles	One model attacks (Skeptic), one defends then stress-tests (Steelman). Research shows this outperforms giving both models the same prompt.
Anonymized cross-examination	In Round 2, models see each other's work labeled "another analyst" to prevent identity bias. Based on NeurIPS 2025 research.
Web search grounding	Before the debate, the server searches for current facts, laws, and regulations. Both models receive this as VERIFIED evidence and must flag ungrounded claims as UNVERIFIED.
Confirmation bias attack	Tell it what you're leaning toward. The Skeptic will specifically attack that leaning.
Domain expertise	Pass `domain: "tax attorney"` or `"systems architect"` to make both analysts domain-specific.
Constrained synthesis	The output forces a structured format: Recommendation, Crux of Disagreement, What Would Falsify, Risk of Acting vs Waiting. Prevents AI from smoothing real disagreements into false consensus.

Example

Input: "Should we elect S-Corp status? Net profit $40K, based in NYC." Domain: tax attorney Current leaning: "I think S-Corp will save on self-employment tax"

What happens:

Web search pulls current NYC tax rates, QBI rules, IRS thresholds
Skeptic leads with: "At $40K net profit in NYC, S-Corp election is mathematically guaranteed to lose you money" and explains exactly why
Steelman finds the strongest case for S-Corp, then stress-tests it against NYC-specific tax penalties
Cross-examination: Skeptic concedes the QBI interaction point, Steelman concedes the compliance cost erasure
Synthesis: Don't elect. Here's the specific profit threshold where it flips.

Configuration

Environment Variables

Required (at minimum):

Variable	Description
`OPENAI_API_KEY`	API key for the Skeptic model (OpenAI by default)
`GEMINI_API_KEY`	API key for the Steelman model (Gemini by default)

Model configuration:

Variable	Default	Description
`SKEPTIC_MODEL`	`gpt-5.4`	Model for the Skeptic role
`SKEPTIC_BASE_URL`	OpenAI default	Base URL for the Skeptic API (change to use Grok, Groq, Mistral, etc.)
`STEELMAN_MODEL`	`gemini-3.1-pro-preview`	Model for the Steelman role
`STEELMAN_PROVIDER`	`gemini`	Set to `openai` to use any OpenAI-compatible API for Steelman
`STEELMAN_BASE_URL`	-	Base URL when using `STEELMAN_PROVIDER=openai`
`STEELMAN_API_KEY`	Falls back to `GEMINI_API_KEY`	API key when using `STEELMAN_PROVIDER=openai`
`CALL_TIMEOUT_MS`	`90000`	Timeout per API call (ms)

Use Any Model Provider

The Skeptic role works with any OpenAI-compatible API out of the box. Just change the base URL:

# Grok (xAI)
SKEPTIC_BASE_URL=https://api.x.ai/v1 SKEPTIC_MODEL=grok-3 OPENAI_API_KEY=xai-...

# Groq
SKEPTIC_BASE_URL=https://api.groq.com/openai/v1 SKEPTIC_MODEL=llama-4-scout OPENAI_API_KEY=gsk_...

# Ollama (local, free)
SKEPTIC_BASE_URL=http://localhost:11434/v1 SKEPTIC_MODEL=llama3 OPENAI_API_KEY=ollama

# Mistral
SKEPTIC_BASE_URL=https://api.mistral.ai/v1 SKEPTIC_MODEL=mistral-large OPENAI_API_KEY=...

The Steelman role uses Gemini by default (for Google Search grounding). To use a different provider, set STEELMAN_PROVIDER=openai and configure the base URL.

MCP Configuration (`.mcp.json`)

{
  "mcpServers": {
    "debate": {
      "command": "npx",
      "args": ["-y", "debate-mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "GEMINI_API_KEY": "AI..."
      }
    }
  }
}

[!NOTE] Bring your own API keys. Debate MCP calls OpenAI and Google APIs directly. You are responsible for your own API usage and costs. A typical debate uses ~20,000-30,000 tokens across both providers.

Tool Parameters

Parameter	Required	Description
`context`	Yes	The plan, decision, or situation to debate. Include all relevant details.
`question`	No	Specific question to focus the debate on.
`domain`	No	Domain expertise: `"tax attorney"`, `"systems architect"`, `"financial advisor"`, etc.
`current_leaning`	No	What you're leaning toward. The Skeptic attacks this to counter confirmation bias.

The Research Behind It

Debate MCP's design is based on peer-reviewed research on multi-agent debate:

Asymmetric roles outperform identical prompts ("Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate", 2025)
Anonymized cross-examination prevents identity bias ("When Identity Skews Debate", NeurIPS 2025)
Steelmanning before disagreeing forces genuine engagement (Kahneman's Adversarial Collaboration framework)
Re-stating the original question each round prevents context drift ("Talk Isn't Always Cheap", ICML 2025)
Caller-model synthesis avoids positional commitment bias from debaters ("Auditing Multi-Agent LLM Reasoning Trees", 2025)
Ray Dalio's triangulation method: get independent expert opinions, map convergence and divergence, then decide

When To Use It

Good for: Taxes, legal decisions, financial planning, business strategy, architecture choices, investment analysis, contract terms, hiring decisions, production deployments.

Not for: Simple coding tasks, quick lookups, routine bug fixes, or questions with obvious answers.

License

MIT

Keywords

mcp

mcp-server

model-context-protocol

claude-code

ai-debate

adversarial

FAQs

What is debate-mcp?

Is debate-mcp well maintained?

Package last updated on 11 Apr 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

debate-mcp

Debate MCP