
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
selenium-ai-agent
Advanced tools
AI-powered Selenium MCP server for browser automation — 74 tools with accessibility tree discovery, test generation, self-healing, and Selenium Grid parallel execution for AI agents
AI-powered Selenium MCP server for browser automation — 75 tools with accessibility tree discovery, selector teaching, BiDi cross-browser support, Selenium Grid parallel execution, test generation & self-healing pipeline, and session tracing.
npm install -g selenium-ai-agent
Or run directly without installing:
npx selenium-ai-agent
Add to your MCP client config:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
Then ask your AI assistant: "Navigate to https://example.com and take a screenshot"
claude mcp add selenium-mcp -- npx selenium-ai-agent
Or add to your project .mcp.json:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
Config paths per OS:
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.jsonAdd to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
Add to .vscode/mcp.json:
{
"servers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"type": "stdio"
}
}
}
Note: Copilot uses
"servers"instead of"mcpServers".
Open the MCP Servers panel in Cline, click Configure, then Advanced MCP Settings, and add:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
Add to ~/.codeium/windsurf/mcp_config.json (global) or .windsurf/mcp_config.json (project):
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"]
}
}
}
All tools include MCP annotations (readOnlyHint, destructiveHint, etc.) that help clients auto-approve safe tools. Read-only tools like capture_page, recording_status, and grid_status are marked as non-destructive and can be auto-approved by clients that support annotations.
After the first approval, click "Always allow" for each tool to stop future prompts. Tools marked readOnlyHint: true may be auto-approved by the client.
Use --allow-mcp selenium-mcp to pre-approve all tools from this server:
claude --allow-mcp selenium-mcp
Or configure in .claude/settings.json:
{
"permissions": {
"allow": ["mcp__selenium-mcp__*"]
}
}
These clients typically allow you to configure auto-approval per tool or per server in their settings. Check your client's MCP settings for "auto-approve" or "always allow" options.
| Variable | Default | Description |
|---|---|---|
SELENIUM_GRID_URL | — | Grid hub URL (enables parallel features) |
SELENIUM_BROWSER | chrome | Browser to use (chrome, firefox, edge) |
SELENIUM_HEADLESS | false | Run browser in headless mode |
SELENIUM_STEALTH | false | Enable stealth mode (hide automation indicators) |
SELENIUM_MCP_OUTPUT_MODE | stdout | Output mode: stdout (return data to LLM) or file (save to disk) |
SELENIUM_MCP_OUTPUT_DIR | auto | Output directory for generated files (auto-detected from project root) |
SELENIUM_MCP_SAVE_TRACE | false | Save session trace JSON to <output>/traces/ |
SELENIUM_MCP_UNRESTRICTED_FILES | false | Bypass workspace path validation (allow writing outside output dir) |
SE_AVOID_STATS | — | Set to true to disable Selenium usage statistics |
Pass env vars in your MCP config:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"env": {
"SELENIUM_HEADLESS": "true",
"SELENIUM_STEALTH": "true",
"SE_AVOID_STATS": "true"
}
}
}
}
npx selenium-ai-agent [flags]
| Flag | Description |
|---|---|
--stealth | Enable stealth mode |
--headless | Run browser headless |
--save-trace | Save session trace JSON |
--output-mode=stdout|file | Set output mode |
--output-dir=<path> | Set output directory |
--grid-url=<url> | Set Selenium Grid hub URL |
--allow-unrestricted-file-access | Bypass workspace file path validation |
| Tool | Description |
|---|---|
navigate_to | Navigate the browser to a URL. Starts browser automatically if not running. |
go_back | Navigate back in browser history. |
go_forward | Navigate forward in browser history. |
refresh_page | Refresh the current page. |
scroll_page | Scroll the page in a direction (up/down/left/right) by pixel amount, or scroll a specific element into view by CSS selector. |
| Tool | Description |
|---|---|
capture_page | Capture the current page state as an accessibility tree — returns elements with ARIA roles, semantic hierarchy, and refs (e1, e2, ...). Discovers up to 300 elements with visibility-aware selectors, ancestor scoping, and Shadow DOM traversal. Read-only. |
take_screenshot | Take a screenshot (viewport, full-page, or element). Uses BiDi when available for full-page/element screenshots, falls back to classic API. Params: origin (viewport/document), ref (element), format (png/jpeg), quality. |
| Tool | Description |
|---|---|
click_element | Click an element using its ref from the page snapshot. |
hover_element | Hover over an element using its ref. |
select_option | Select a dropdown option by value, text, or index. |
drag_drop | Drag from one element to another using refs. |
teach_selector | Teach the system a preferred CSS selector for an element. Saved as Phase 0 (highest priority) in future element discovery on that domain. Auto-scopes to site-wide for header/nav/footer elements, or path-specific for content. |
| Tool | Description |
|---|---|
input_text | Type text into an input field or textarea. |
key_press | Press a keyboard key, optionally with modifiers (ctrl, alt, shift, meta). |
file_upload | Upload a file through a file input element. |
| Tool | Description |
|---|---|
mouse_move | Move mouse to specific coordinates. |
mouse_click | Click at coordinates with specified button (left, right, middle). |
mouse_drag | Drag from one position to another. |
| Tool | Description |
|---|---|
tab_list | List all open browser tabs with titles and URLs. Read-only. |
tab_select | Switch to a specific browser tab. |
tab_new | Open a new browser tab, optionally navigating to a URL. |
tab_close | Close a specific browser tab. |
| Tool | Description |
|---|---|
verify_element_visible | Verify that an element is visible on the page (with timeout). Read-only. |
verify_text_visible | Verify that specific text is visible on the page (with timeout). Read-only. |
verify_value | Verify that an input element has the expected value. Read-only. |
verify_list_visible | Verify that multiple text items are all visible on the page. Read-only. |
| Tool | Description |
|---|---|
wait_for | Wait for a condition: element visible, clickable, present, URL contains, or title contains. |
execute_javascript | Execute JavaScript code in the browser context with optional return value. |
resize_window | Resize the browser window to specified dimensions. |
dialog_handle | Handle browser dialogs (alert, confirm, prompt). |
console_logs | Get or clear browser console logs. Uses BiDi event collector when available for cross-browser support, falls back to classic log API. |
network_monitor | Monitor network requests: get requests, clear, or toggle offline mode. |
pdf_generate | Generate a PDF from the current page. Uses BiDi printPage for cross-browser support (Chrome, Firefox, Edge), falls back to CDP. Params: format, landscape, scale, pageRanges. Optional filePath — omit to return as base64 resource. |
| Tool | Description |
|---|---|
close_browser | Close the browser and end the session. |
reset_session | Reset the browser session (close and restart). |
set_stealth_mode | Enable/disable stealth mode — hides navigator.webdriver, patches plugins, sets realistic languages. |
| Tool | Description |
|---|---|
start_recording | Start recording browser actions for test script generation. |
stop_recording | Stop recording and return full action log with element locators and framework hint. |
recording_status | Check if recording is active and show recent actions. Read-only. |
clear_recording | Clear all recorded browser actions. |
| Tool | Description |
|---|---|
planner_setup_page | Initialize test planning — navigate to app and start exploring. |
planner_explore_page | Explore a page in detail, discovering elements, forms, and links. |
planner_save_plan | Save completed test plan to a markdown file. |
| Tool | Description |
|---|---|
generator_setup_page | Initialize test generation session — navigate to app, start recording, set framework. |
generator_read_log | Retrieve the action log from the recording session. Read-only. |
generator_write_test | Save generated test code and update .test-manifest.json. Supports verify (validates selectors against live page) and specFile (links to spec). |
generator_write_seed | Write a seed/bootstrap test (auth, fixtures, env setup) and register in manifest under seedTests[]. |
generator_save_spec | Save a structured markdown spec to <output>/specs/. |
generator_read_spec | Read a spec file. Read-only. |
| Tool | Description |
|---|---|
healer_run_tests | Execute tests and return output. Supports manifest mode (reads .test-manifest.json) or explicit mode (provide command + args). Runs seed tests first when present. |
healer_debug_test | Run a single test in verbose mode with detailed output (15KB stdout, 8KB stderr). |
healer_fix_test | Apply a fix to a test file with .bak backup. Supports verify (validates selectors in fixed code). |
healer_inspect_page | Inspect current page against expected locators — reports found, missing, and changed elements with suggested updated locators. Use after test failure to understand UI drift. |
browser_generate_locator | Generate robust locator strategy for an element by description. Read-only. |
| Tool | Description |
|---|---|
analyzer_setup | Initialize regression analysis session with product URL and business context. |
analyzer_import_context | Import additional context from files, inline text, or URLs. |
analyzer_scan_product | Explore product using process walking and page scanning. |
analyzer_build_risk_profile | Build risk profile from discovered features and context. Read-only. |
analyzer_save_profile | Save risk profile to YAML or JSON file. |
analyzer_generate_documentation | Generate product discovery documentation with screenshots. |
| Tool | Description |
|---|---|
batch_execute | Execute up to 20 tool steps in a single round trip. Intermediate steps skip snapshots for speed. |
| Tool | Description |
|---|---|
grid_status | Check Grid status — nodes, browsers, capacity. Read-only. |
grid_start | Start Selenium Grid via Docker Compose with configurable Chrome/Firefox node counts. |
grid_stop | Stop Selenium Grid. |
grid_scale | Scale Grid to desired number of browser nodes. |
| Tool | Description |
|---|---|
session_create | Create a new browser session on the Grid. |
session_select | Select a grid session as active browser for all subsequent tool calls. |
session_list | List all active Grid sessions, optionally filtered by tags. Read-only. |
session_destroy | Destroy a specific Grid session. |
session_destroy_all | Destroy all Grid sessions, optionally filtered by tags. |
| Tool | Description |
|---|---|
parallel_explore | Explore multiple URLs in parallel — each target gets its own Grid session. |
parallel_execute | Execute multiple task sequences in parallel across Grid sessions. |
planner_generate_plan | Generate structured test plan from parallel exploration results. |
| Tool | Description |
|---|---|
exploration_merge | Merge multiple exploration results, deduplicate pages, build site map. Read-only. |
exploration_diff | Compare two exploration results — find added, removed, and changed pages. Read-only. |
Every tool accepts an optional expectation parameter to control what data is included in the response:
{
"expectation": {
"includeSnapshot": true,
"includeConsole": true,
"includeNetwork": true,
"snapshotOptions": { "selector": "#main", "maxLength": 5000 },
"consoleOptions": { "levels": ["error", "warn"], "maxMessages": 10 },
"diffOptions": { "enabled": true, "format": "unified" }
}
}
| Option | Description |
|---|---|
includeSnapshot | Include page snapshot (element list) in the response |
includeConsole | Include browser console logs |
includeNetwork | Include network request summary (requires BiDi) |
snapshotOptions.selector | CSS selector to scope element discovery |
snapshotOptions.maxLength | Truncate snapshot text at this length |
consoleOptions.levels | Filter by log level: error, warn, info, log |
diffOptions.enabled | Return only changes since last snapshot |
diffOptions.format | Diff format: minimal or unified |
Each tool category has sensible defaults (e.g., navigation tools include snapshot, verification tools don't).
The server uses WebDriver BiDi protocol (always enabled) for cross-browser features that go beyond what the classic WebDriver API offers:
take_screenshot with origin: "document" captures the entire scrollable page, not just the viewporttake_screenshot with ref: "e5" captures a specific elementpdf_generate works on Chrome, Firefox, and Edge (was Chrome-only with CDP)console_logs uses BiDi LogInspector for real-time console events across all browsersscript.addPreloadScript to mask automation indicatorsBiDi features degrade gracefully — if a browser doesn't support a specific BiDi feature, the tool falls back to the classic API.
The teach_selector tool lets you override auto-computed selectors with your own preferred CSS selectors. Taught selectors are persisted to <output>/selector-hints.json and loaded as Phase 0 (highest priority) during element discovery on matching pages.
teach_selector with a description and CSS selector while on the page*), content elements default to the current path patternteach_selector({
description: "the NL language link",
css: "a[href='/nl/']",
scope: "*" // optional — auto-determined if omitted
})
Hints file structure (selector-hints.json):
{
"example.com": {
"*": [
{ "css": "a[href='/nl/']", "tag": "a", "text": "NL" }
],
"/blog/*": [
{ "css": "#post-title", "tag": "h1", "text": "My Post" }
]
}
}
The server uses a 16-phase selector computation engine that produces human-readable, semantically meaningful CSS and XPath selectors for every discovered element.
| Phase | Strategy | Example |
|---|---|---|
| 0 | Taught hints | User-taught a[href="/nl/"] |
| 1 | By ID | #login-form |
| 2 | By test ID | [data-testid="submit-btn"] |
| 2b | By descendant test ID | form:has([data-testid="email"]) |
| 3 | By role + name | button[aria-label="Close"] |
| 4 | By label | //label[normalize-space()='Email']//input |
| 5 | By placeholder | input[placeholder="Search..."] |
| 6 | By text | //a[normalize-space()='Sign In'] |
| 7 | By attribute | a[hreflang="nl"], img[alt="Logo"] |
| 9 | By ARIA role | [role="dialog"] |
| 10 | By state | dialog[open], [aria-expanded] |
| 11 | By table cell | #data-table > tbody > tr:nth-child(2) > td:nth-child(3) |
| 12 | By compound attrs | input[type="email"][name="user"] |
| 13 | By semantic class | button.primary-action |
| 14 | By position | #sidebar > ul > li:nth-of-type(3) |
| 15 | By text (loose) | //span[contains(normalize-space(),'Welcome')] |
| 16 | By positional index | (//button[normalize-space()='Save'])[2] |
nav[aria-label="Main"] a[href="/"])>>> notation for cross-boundary selectorsThe generator and healer tools form a complete test automation pipeline:
planner_setup_page → planner_explore_page → planner_save_plan
generator_setup_page → [interact with app] → stop_recording → generator_write_test
generator_write_test validates selectors against the live page before saving.test-manifest.json is created alongside tests with framework, run command, and test listhealer_run_tests → healer_inspect_page → healer_fix_test → healer_run_tests
healer_run_tests reads .test-manifest.json to auto-discover how to run testshealer_inspect_page compares expected locators against the live page to find UI drifthealer_fix_test validates selectors in the fixed code before writingSave structured requirements as markdown specs before generating tests:
generator_save_spec → generator_write_test (with specFile param)
Enable tracing to record every tool call and result as structured JSON:
npx selenium-ai-agent --save-trace
Or via env var:
{
"env": { "SELENIUM_MCP_SAVE_TRACE": "true" }
}
Traces are saved to <output>/traces/session-<timestamp>.json on session close. Each trace entry records:
By default, all file-writing tools (screenshots, PDFs, test files, plans, analyzer output) validate that paths resolve within the output directory. This prevents accidental writes to system paths.
To override (e.g., for CI/CD or trusted environments):
npx selenium-ai-agent --allow-unrestricted-file-access
The healer_fix_test tool is exempt — it modifies existing project test files by design.
For parallel browser automation across multiple browsers, set SELENIUM_GRID_URL:
{
"mcpServers": {
"selenium-mcp": {
"command": "npx",
"args": ["selenium-ai-agent"],
"env": {
"SELENIUM_GRID_URL": "http://localhost:4444"
}
}
}
}
The project includes a Docker Compose file for local Grid setup:
# Start Grid with 4 Chrome + 1 Firefox nodes
grid_start
# Or use docker-compose directly
docker compose up -d
Parallel exploration — explore multiple sections of a site simultaneously:
session_create (x3) → parallel_explore → exploration_merge
Parallel execution — run test steps across browsers:
session_create (chrome + firefox) → parallel_execute
Cross-browser testing — same actions on different browsers:
session_create (chrome) → session_create (firefox) → parallel_execute
See the project README for Docker Compose setup and Grid architecture details.
Control how binary data (screenshots, PDFs) is returned:
| Mode | Behavior |
|---|---|
stdout (default) | Return base64-encoded data to the LLM for inline display |
file | Save to disk in <output>/screenshots/ or <output>/pdfs/ |
npx selenium-ai-agent --output-mode=file
selenium-mcp-server/src/
├── server.ts # MCP server, tool routing, expectation system, tracing
├── context.ts # Browser session state, EventCollector, SessionTracer
├── types.ts # Core types (ToolResult, BrowserConfig, Expectation, Grid types)
├── types/
│ └── manifest.ts # Shared test manifest types (generator ↔ healer)
├── bidi/
│ ├── event-collector.ts # BiDi event subscriptions (console, network, navigation)
│ └── index.ts
├── trace/
│ ├── session-tracer.ts # Tool call + result recording
│ └── index.ts
├── utils/
│ ├── bidi-helpers.ts # BiDi WebSocket URL rewriting + context factory
│ ├── chrome-options.ts # Chrome options builder + stealth scripts
│ ├── element-discovery/ # Accessibility tree discovery (e1-e300)
│ │ ├── index.ts # Barrel exports
│ │ ├── discover.ts # discoverElements() with selector hints
│ │ ├── selector-scripts.ts # Browser-side computeSelector() (15 phases)
│ │ ├── tree-scripts.ts # Browser-side accessibility tree walker
│ │ ├── format-tree.ts # formatAccessibilityTree()
│ │ └── element-scripts.ts # extractElementInfo(), findElementByInfo()
│ ├── selector-hints.ts # Persistent domain-scoped selector hint storage
│ ├── paths.ts # Output directory resolution
│ ├── sandbox.ts # Workspace path validation
│ ├── selector-validation.ts # Extract + validate selectors from test code
│ ├── schema.ts # Zod → JSON Schema converter
│ └── docker.ts # Docker Compose helpers
├── grid/
│ ├── grid-client.ts # Grid REST API client
│ ├── grid-session.ts # Remote browser session
│ ├── session-pool.ts # Session lifecycle management
│ ├── session-context.ts # Context adapter for grid sessions
│ └── exploration-coordinator.ts
└── tools/ # 75 tools grouped by domain
├── base.ts # BaseTool abstract class + MCP annotations
├── index.ts # Tool registry
├── navigation/ # navigate_to, go_back, go_forward, refresh_page, scroll_page
├── page/ # capture_page, take_screenshot
├── elements/ # click, hover, select, drag_drop, teach_selector
├── input/ # input_text, key_press, file_upload
├── mouse/ # mouse_move, mouse_click, mouse_drag
├── tabs/ # tab_list, tab_select, tab_new, tab_close
├── verification/ # verify_element_visible, verify_text, verify_value, verify_list
├── browser/ # wait, javascript, resize, dialog, console, network, pdf
├── session/ # close_browser, reset_session, set_stealth_mode
├── recording/ # start, stop, status, clear
├── agents/ # planner, generator, healer, spec tools
├── analyzer/ # setup, import, scan, risk, save, documentation
├── batch/ # batch_execute
└── grid/ # 14 grid management + parallel execution tools
MIT
FAQs
AI-powered Selenium MCP server for browser automation — 74 tools with accessibility tree discovery, test generation, self-healing, and Selenium Grid parallel execution for AI agents
We found that selenium-ai-agent demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.