New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

selenium-ai-agent

Package Overview
Dependencies
Maintainers
1
Versions
17
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

selenium-ai-agent

AI-powered Selenium MCP server for browser automation — 74 tools with accessibility tree discovery, test generation, self-healing, and Selenium Grid parallel execution for AI agents

latest
Source
npmnpm
Version
3.1.0
Version published
Maintainers
1
Created
Source

selenium-ai-agent

AI-powered Selenium MCP server for browser automation — 75 tools with accessibility tree discovery, selector teaching, BiDi cross-browser support, Selenium Grid parallel execution, test generation & self-healing pipeline, and session tracing.

One-Click Install

Install in VS Code Install in VS Code Insiders Install in Cursor

Install

npm install -g selenium-ai-agent

Or run directly without installing:

npx selenium-ai-agent

Requirements

  • Node.js 18+
  • Chrome browser (or Firefox/Edge)
  • ChromeDriver is automatically managed by selenium-webdriver

Quick Start

Add to your MCP client config:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Then ask your AI assistant: "Navigate to https://example.com and take a screenshot"

Client Setup

Claude Code

claude mcp add selenium-mcp -- npx selenium-ai-agent

Or add to your project .mcp.json:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Config paths per OS:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Cursor

Add to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

GitHub Copilot (VS Code 1.99+)

Add to .vscode/mcp.json:

{
  "servers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "type": "stdio"
    }
  }
}

Note: Copilot uses "servers" instead of "mcpServers".

Cline

Open the MCP Servers panel in Cline, click Configure, then Advanced MCP Settings, and add:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json (global) or .windsurf/mcp_config.json (project):

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"]
    }
  }
}

Tool Auto-Approval (Reducing "Yes" Prompts)

All tools include MCP annotations (readOnlyHint, destructiveHint, etc.) that help clients auto-approve safe tools. Read-only tools like capture_page, recording_status, and grid_status are marked as non-destructive and can be auto-approved by clients that support annotations.

Claude Desktop

After the first approval, click "Always allow" for each tool to stop future prompts. Tools marked readOnlyHint: true may be auto-approved by the client.

Claude Code

Use --allow-mcp selenium-mcp to pre-approve all tools from this server:

claude --allow-mcp selenium-mcp

Or configure in .claude/settings.json:

{
  "permissions": {
    "allow": ["mcp__selenium-mcp__*"]
  }
}

Cursor / Cline / Windsurf

These clients typically allow you to configure auto-approval per tool or per server in their settings. Check your client's MCP settings for "auto-approve" or "always allow" options.

Environment Variables

VariableDefaultDescription
SELENIUM_GRID_URLGrid hub URL (enables parallel features)
SELENIUM_BROWSERchromeBrowser to use (chrome, firefox, edge)
SELENIUM_HEADLESSfalseRun browser in headless mode
SELENIUM_STEALTHfalseEnable stealth mode (hide automation indicators)
SELENIUM_MCP_OUTPUT_MODEstdoutOutput mode: stdout (return data to LLM) or file (save to disk)
SELENIUM_MCP_OUTPUT_DIRautoOutput directory for generated files (auto-detected from project root)
SELENIUM_MCP_SAVE_TRACEfalseSave session trace JSON to <output>/traces/
SELENIUM_MCP_UNRESTRICTED_FILESfalseBypass workspace path validation (allow writing outside output dir)
SE_AVOID_STATSSet to true to disable Selenium usage statistics

Pass env vars in your MCP config:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "env": {
        "SELENIUM_HEADLESS": "true",
        "SELENIUM_STEALTH": "true",
        "SE_AVOID_STATS": "true"
      }
    }
  }
}

CLI Flags

npx selenium-ai-agent [flags]
FlagDescription
--stealthEnable stealth mode
--headlessRun browser headless
--save-traceSave session trace JSON
--output-mode=stdout|fileSet output mode
--output-dir=<path>Set output directory
--grid-url=<url>Set Selenium Grid hub URL
--allow-unrestricted-file-accessBypass workspace file path validation

Tools (75)

Navigation (5)

ToolDescription
navigate_toNavigate the browser to a URL. Starts browser automatically if not running.
go_backNavigate back in browser history.
go_forwardNavigate forward in browser history.
refresh_pageRefresh the current page.
scroll_pageScroll the page in a direction (up/down/left/right) by pixel amount, or scroll a specific element into view by CSS selector.

Page Analysis (2)

ToolDescription
capture_pageCapture the current page state as an accessibility tree — returns elements with ARIA roles, semantic hierarchy, and refs (e1, e2, ...). Discovers up to 300 elements with visibility-aware selectors, ancestor scoping, and Shadow DOM traversal. Read-only.
take_screenshotTake a screenshot (viewport, full-page, or element). Uses BiDi when available for full-page/element screenshots, falls back to classic API. Params: origin (viewport/document), ref (element), format (png/jpeg), quality.

Elements (5)

ToolDescription
click_elementClick an element using its ref from the page snapshot.
hover_elementHover over an element using its ref.
select_optionSelect a dropdown option by value, text, or index.
drag_dropDrag from one element to another using refs.
teach_selectorTeach the system a preferred CSS selector for an element. Saved as Phase 0 (highest priority) in future element discovery on that domain. Auto-scopes to site-wide for header/nav/footer elements, or path-specific for content.

Input (3)

ToolDescription
input_textType text into an input field or textarea.
key_pressPress a keyboard key, optionally with modifiers (ctrl, alt, shift, meta).
file_uploadUpload a file through a file input element.

Mouse (3)

ToolDescription
mouse_moveMove mouse to specific coordinates.
mouse_clickClick at coordinates with specified button (left, right, middle).
mouse_dragDrag from one position to another.

Tabs (4)

ToolDescription
tab_listList all open browser tabs with titles and URLs. Read-only.
tab_selectSwitch to a specific browser tab.
tab_newOpen a new browser tab, optionally navigating to a URL.
tab_closeClose a specific browser tab.

Verification (4)

ToolDescription
verify_element_visibleVerify that an element is visible on the page (with timeout). Read-only.
verify_text_visibleVerify that specific text is visible on the page (with timeout). Read-only.
verify_valueVerify that an input element has the expected value. Read-only.
verify_list_visibleVerify that multiple text items are all visible on the page. Read-only.

Browser (7)

ToolDescription
wait_forWait for a condition: element visible, clickable, present, URL contains, or title contains.
execute_javascriptExecute JavaScript code in the browser context with optional return value.
resize_windowResize the browser window to specified dimensions.
dialog_handleHandle browser dialogs (alert, confirm, prompt).
console_logsGet or clear browser console logs. Uses BiDi event collector when available for cross-browser support, falls back to classic log API.
network_monitorMonitor network requests: get requests, clear, or toggle offline mode.
pdf_generateGenerate a PDF from the current page. Uses BiDi printPage for cross-browser support (Chrome, Firefox, Edge), falls back to CDP. Params: format, landscape, scale, pageRanges. Optional filePath — omit to return as base64 resource.

Session (3)

ToolDescription
close_browserClose the browser and end the session.
reset_sessionReset the browser session (close and restart).
set_stealth_modeEnable/disable stealth mode — hides navigator.webdriver, patches plugins, sets realistic languages.

Recording (4)

ToolDescription
start_recordingStart recording browser actions for test script generation.
stop_recordingStop recording and return full action log with element locators and framework hint.
recording_statusCheck if recording is active and show recent actions. Read-only.
clear_recordingClear all recorded browser actions.

Test Planner (3)

ToolDescription
planner_setup_pageInitialize test planning — navigate to app and start exploring.
planner_explore_pageExplore a page in detail, discovering elements, forms, and links.
planner_save_planSave completed test plan to a markdown file.

Test Generator (6)

ToolDescription
generator_setup_pageInitialize test generation session — navigate to app, start recording, set framework.
generator_read_logRetrieve the action log from the recording session. Read-only.
generator_write_testSave generated test code and update .test-manifest.json. Supports verify (validates selectors against live page) and specFile (links to spec).
generator_write_seedWrite a seed/bootstrap test (auth, fixtures, env setup) and register in manifest under seedTests[].
generator_save_specSave a structured markdown spec to <output>/specs/.
generator_read_specRead a spec file. Read-only.

Test Healer (5)

ToolDescription
healer_run_testsExecute tests and return output. Supports manifest mode (reads .test-manifest.json) or explicit mode (provide command + args). Runs seed tests first when present.
healer_debug_testRun a single test in verbose mode with detailed output (15KB stdout, 8KB stderr).
healer_fix_testApply a fix to a test file with .bak backup. Supports verify (validates selectors in fixed code).
healer_inspect_pageInspect current page against expected locators — reports found, missing, and changed elements with suggested updated locators. Use after test failure to understand UI drift.
browser_generate_locatorGenerate robust locator strategy for an element by description. Read-only.

Regression Analyzer (6)

ToolDescription
analyzer_setupInitialize regression analysis session with product URL and business context.
analyzer_import_contextImport additional context from files, inline text, or URLs.
analyzer_scan_productExplore product using process walking and page scanning.
analyzer_build_risk_profileBuild risk profile from discovered features and context. Read-only.
analyzer_save_profileSave risk profile to YAML or JSON file.
analyzer_generate_documentationGenerate product discovery documentation with screenshots.

Batch (1)

ToolDescription
batch_executeExecute up to 20 tool steps in a single round trip. Intermediate steps skip snapshots for speed.

Grid Management (4)

ToolDescription
grid_statusCheck Grid status — nodes, browsers, capacity. Read-only.
grid_startStart Selenium Grid via Docker Compose with configurable Chrome/Firefox node counts.
grid_stopStop Selenium Grid.
grid_scaleScale Grid to desired number of browser nodes.

Grid Sessions (5)

ToolDescription
session_createCreate a new browser session on the Grid.
session_selectSelect a grid session as active browser for all subsequent tool calls.
session_listList all active Grid sessions, optionally filtered by tags. Read-only.
session_destroyDestroy a specific Grid session.
session_destroy_allDestroy all Grid sessions, optionally filtered by tags.

Grid Parallel Execution (3)

ToolDescription
parallel_exploreExplore multiple URLs in parallel — each target gets its own Grid session.
parallel_executeExecute multiple task sequences in parallel across Grid sessions.
planner_generate_planGenerate structured test plan from parallel exploration results.

Grid Exploration Analysis (2)

ToolDescription
exploration_mergeMerge multiple exploration results, deduplicate pages, build site map. Read-only.
exploration_diffCompare two exploration results — find added, removed, and changed pages. Read-only.

Expectation System

Every tool accepts an optional expectation parameter to control what data is included in the response:

{
  "expectation": {
    "includeSnapshot": true,
    "includeConsole": true,
    "includeNetwork": true,
    "snapshotOptions": { "selector": "#main", "maxLength": 5000 },
    "consoleOptions": { "levels": ["error", "warn"], "maxMessages": 10 },
    "diffOptions": { "enabled": true, "format": "unified" }
  }
}
OptionDescription
includeSnapshotInclude page snapshot (element list) in the response
includeConsoleInclude browser console logs
includeNetworkInclude network request summary (requires BiDi)
snapshotOptions.selectorCSS selector to scope element discovery
snapshotOptions.maxLengthTruncate snapshot text at this length
consoleOptions.levelsFilter by log level: error, warn, info, log
diffOptions.enabledReturn only changes since last snapshot
diffOptions.formatDiff format: minimal or unified

Each tool category has sensible defaults (e.g., navigation tools include snapshot, verification tools don't).

BiDi Cross-Browser Features

The server uses WebDriver BiDi protocol (always enabled) for cross-browser features that go beyond what the classic WebDriver API offers:

  • Full-page screenshotstake_screenshot with origin: "document" captures the entire scrollable page, not just the viewport
  • Element screenshotstake_screenshot with ref: "e5" captures a specific element
  • Cross-browser PDFpdf_generate works on Chrome, Firefox, and Edge (was Chrome-only with CDP)
  • Console eventsconsole_logs uses BiDi LogInspector for real-time console events across all browsers
  • Network monitoring — BiDi network events provide request/response tracking
  • Stealth mode — Injects preload scripts via BiDi script.addPreloadScript to mask automation indicators

BiDi features degrade gracefully — if a browser doesn't support a specific BiDi feature, the tool falls back to the classic API.

Selector Teaching & Hints

The teach_selector tool lets you override auto-computed selectors with your own preferred CSS selectors. Taught selectors are persisted to <output>/selector-hints.json and loaded as Phase 0 (highest priority) during element discovery on matching pages.

How It Works

  • Call teach_selector with a description and CSS selector while on the page
  • The selector is validated in-browser (must match exactly 1 visible element)
  • Scope is auto-determined: header/nav/footer elements default to site-wide (*), content elements default to the current path pattern
  • On subsequent page snapshots, matching hints are loaded and used before any auto-computation

Example

teach_selector({
  description: "the NL language link",
  css: "a[href='/nl/']",
  scope: "*"  // optional — auto-determined if omitted
})

Hints file structure (selector-hints.json):

{
  "example.com": {
    "*": [
      { "css": "a[href='/nl/']", "tag": "a", "text": "NL" }
    ],
    "/blog/*": [
      { "css": "#post-title", "tag": "h1", "text": "My Post" }
    ]
  }
}

Element Discovery

The server uses a 16-phase selector computation engine that produces human-readable, semantically meaningful CSS and XPath selectors for every discovered element.

Selector Priority (Phases 0–16)

PhaseStrategyExample
0Taught hintsUser-taught a[href="/nl/"]
1By ID#login-form
2By test ID[data-testid="submit-btn"]
2bBy descendant test IDform:has([data-testid="email"])
3By role + namebutton[aria-label="Close"]
4By label//label[normalize-space()='Email']//input
5By placeholderinput[placeholder="Search..."]
6By text//a[normalize-space()='Sign In']
7By attributea[hreflang="nl"], img[alt="Logo"]
9By ARIA role[role="dialog"]
10By statedialog[open], [aria-expanded]
11By table cell#data-table > tbody > tr:nth-child(2) > td:nth-child(3)
12By compound attrsinput[type="email"][name="user"]
13By semantic classbutton.primary-action
14By position#sidebar > ul > li:nth-of-type(3)
15By text (loose)//span[contains(normalize-space(),'Welcome')]
16By positional index(//button[normalize-space()='Save'])[2]

Key Capabilities

  • Visibility-aware — only visible elements are counted for uniqueness, preventing hidden duplicates from causing fallbacks to fragile selectors
  • Ancestor scoping — when a selector isn't unique globally, it's scoped to the nearest ancestor with an ID, test attribute, or landmark (nav[aria-label="Main"] a[href="/"])
  • Shadow DOM — traverses open shadow roots, scopes CSS within shadow boundaries, uses >>> notation for cross-boundary selectors
  • Two-pass discovery — semantic elements (links, buttons, headings) get refs first; generic elements with test attributes fill remaining budget
  • Non-semantic class filtering — auto-skips CSS-in-JS hashes, Tailwind utilities, and framework-generated classes

Test Generation & Healing Pipeline

The generator and healer tools form a complete test automation pipeline:

1. Plan

planner_setup_page → planner_explore_page → planner_save_plan

2. Record & Generate

generator_setup_page → [interact with app] → stop_recording → generator_write_test
  • Recording captures actions with element locators (id, name, text, aria-label)
  • generator_write_test validates selectors against the live page before saving
  • A .test-manifest.json is created alongside tests with framework, run command, and test list

3. Heal

healer_run_tests → healer_inspect_page → healer_fix_test → healer_run_tests
  • healer_run_tests reads .test-manifest.json to auto-discover how to run tests
  • healer_inspect_page compares expected locators against the live page to find UI drift
  • healer_fix_test validates selectors in the fixed code before writing
  • Seed tests (auth, fixtures) are run automatically before the main test when registered in the manifest

Spec Files

Save structured requirements as markdown specs before generating tests:

generator_save_spec → generator_write_test (with specFile param)

Session Tracing

Enable tracing to record every tool call and result as structured JSON:

npx selenium-ai-agent --save-trace

Or via env var:

{
  "env": { "SELENIUM_MCP_SAVE_TRACE": "true" }
}

Traces are saved to <output>/traces/session-<timestamp>.json on session close. Each trace entry records:

  • Tool name and parameters
  • Result content and error status
  • Timestamps for performance analysis

Workspace Isolation

By default, all file-writing tools (screenshots, PDFs, test files, plans, analyzer output) validate that paths resolve within the output directory. This prevents accidental writes to system paths.

To override (e.g., for CI/CD or trusted environments):

npx selenium-ai-agent --allow-unrestricted-file-access

The healer_fix_test tool is exempt — it modifies existing project test files by design.

Selenium Grid

For parallel browser automation across multiple browsers, set SELENIUM_GRID_URL:

{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["selenium-ai-agent"],
      "env": {
        "SELENIUM_GRID_URL": "http://localhost:4444"
      }
    }
  }
}

Quick Start with Docker Compose

The project includes a Docker Compose file for local Grid setup:

# Start Grid with 4 Chrome + 1 Firefox nodes
grid_start

# Or use docker-compose directly
docker compose up -d

Parallel Workflows

Parallel exploration — explore multiple sections of a site simultaneously:

session_create (x3) → parallel_explore → exploration_merge

Parallel execution — run test steps across browsers:

session_create (chrome + firefox) → parallel_execute

Cross-browser testing — same actions on different browsers:

session_create (chrome) → session_create (firefox) → parallel_execute

See the project README for Docker Compose setup and Grid architecture details.

Output Mode

Control how binary data (screenshots, PDFs) is returned:

ModeBehavior
stdout (default)Return base64-encoded data to the LLM for inline display
fileSave to disk in <output>/screenshots/ or <output>/pdfs/
npx selenium-ai-agent --output-mode=file

Architecture

selenium-mcp-server/src/
├── server.ts              # MCP server, tool routing, expectation system, tracing
├── context.ts             # Browser session state, EventCollector, SessionTracer
├── types.ts               # Core types (ToolResult, BrowserConfig, Expectation, Grid types)
├── types/
│   └── manifest.ts        # Shared test manifest types (generator ↔ healer)
├── bidi/
│   ├── event-collector.ts # BiDi event subscriptions (console, network, navigation)
│   └── index.ts
├── trace/
│   ├── session-tracer.ts  # Tool call + result recording
│   └── index.ts
├── utils/
│   ├── bidi-helpers.ts    # BiDi WebSocket URL rewriting + context factory
│   ├── chrome-options.ts  # Chrome options builder + stealth scripts
│   ├── element-discovery/   # Accessibility tree discovery (e1-e300)
│   │   ├── index.ts         # Barrel exports
│   │   ├── discover.ts      # discoverElements() with selector hints
│   │   ├── selector-scripts.ts # Browser-side computeSelector() (15 phases)
│   │   ├── tree-scripts.ts  # Browser-side accessibility tree walker
│   │   ├── format-tree.ts   # formatAccessibilityTree()
│   │   └── element-scripts.ts # extractElementInfo(), findElementByInfo()
│   ├── selector-hints.ts    # Persistent domain-scoped selector hint storage
│   ├── paths.ts           # Output directory resolution
│   ├── sandbox.ts         # Workspace path validation
│   ├── selector-validation.ts # Extract + validate selectors from test code
│   ├── schema.ts          # Zod → JSON Schema converter
│   └── docker.ts          # Docker Compose helpers
├── grid/
│   ├── grid-client.ts     # Grid REST API client
│   ├── grid-session.ts    # Remote browser session
│   ├── session-pool.ts    # Session lifecycle management
│   ├── session-context.ts # Context adapter for grid sessions
│   └── exploration-coordinator.ts
└── tools/                 # 75 tools grouped by domain
    ├── base.ts            # BaseTool abstract class + MCP annotations
    ├── index.ts           # Tool registry
    ├── navigation/        # navigate_to, go_back, go_forward, refresh_page, scroll_page
    ├── page/              # capture_page, take_screenshot
    ├── elements/          # click, hover, select, drag_drop, teach_selector
    ├── input/             # input_text, key_press, file_upload
    ├── mouse/             # mouse_move, mouse_click, mouse_drag
    ├── tabs/              # tab_list, tab_select, tab_new, tab_close
    ├── verification/      # verify_element_visible, verify_text, verify_value, verify_list
    ├── browser/           # wait, javascript, resize, dialog, console, network, pdf
    ├── session/           # close_browser, reset_session, set_stealth_mode
    ├── recording/         # start, stop, status, clear
    ├── agents/            # planner, generator, healer, spec tools
    ├── analyzer/          # setup, import, scan, risk, save, documentation
    ├── batch/             # batch_execute
    └── grid/              # 14 grid management + parallel execution tools

License

MIT

Keywords

selenium

FAQs

Package last updated on 24 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts