agent-browser
Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
Installation
npm (recommended)
npm install -g agent-browser
agent-browser install
From Source
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native
pnpm link --global
agent-browser install
Linux Dependencies
On Linux, install system dependencies:
agent-browser install --with-deps
Quick Start
agent-browser open example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1
agent-browser screenshot page.png
agent-browser close
Traditional Selectors (also supported)
agent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"
Commands
Core Commands
agent-browser open <url>
agent-browser click <sel>
agent-browser dblclick <sel>
agent-browser focus <sel>
agent-browser type <sel> <text>
agent-browser fill <sel> <text>
agent-browser press <key>
agent-browser keydown <key>
agent-browser keyup <key>
agent-browser hover <sel>
agent-browser select <sel> <val>
agent-browser check <sel>
agent-browser uncheck <sel>
agent-browser scroll <dir> [px]
agent-browser scrollintoview <sel>
agent-browser drag <src> <tgt>
agent-browser upload <sel> <files>
agent-browser screenshot [path]
agent-browser pdf <path>
agent-browser snapshot
agent-browser eval <js>
agent-browser connect <port>
agent-browser close
Get Info
agent-browser get text <sel>
agent-browser get html <sel>
agent-browser get value <sel>
agent-browser get attr <sel> <attr>
agent-browser get title
agent-browser get url
agent-browser get count <sel>
agent-browser get box <sel>
Check State
agent-browser is visible <sel>
agent-browser is enabled <sel>
agent-browser is checked <sel>
Find Elements (Semantic Locators)
agent-browser find role <role> <action> [value]
agent-browser find text <text> <action>
agent-browser find label <label> <action> [value]
agent-browser find placeholder <ph> <action> [value]
agent-browser find alt <text> <action>
agent-browser find title <text> <action>
agent-browser find testid <id> <action> [value]
agent-browser find first <sel> <action> [value]
agent-browser find last <sel> <action> [value]
agent-browser find nth <n> <sel> <action> [value]
Actions: click, fill, check, hover, text
Examples:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" text
Wait
agent-browser wait <selector>
agent-browser wait <ms>
agent-browser wait --text "Welcome"
agent-browser wait --url "**/dash"
agent-browser wait --load networkidle
agent-browser wait --fn "window.ready === true"
Load states: load, domcontentloaded, networkidle
Mouse Control
agent-browser mouse move <x> <y>
agent-browser mouse down [button]
agent-browser mouse up [button]
agent-browser mouse wheel <dy> [dx]
Browser Settings
agent-browser set viewport <w> <h>
agent-browser set device <name>
agent-browser set geo <lat> <lng>
agent-browser set offline [on|off]
agent-browser set headers <json>
agent-browser set credentials <u> <p>
agent-browser set media [dark|light]
Cookies & Storage
agent-browser cookies
agent-browser cookies set <name> <val>
agent-browser cookies clear
agent-browser storage local
agent-browser storage local <key>
agent-browser storage local set <k> <v>
agent-browser storage local clear
agent-browser storage session
Network
agent-browser network route <url>
agent-browser network route <url> --abort
agent-browser network route <url> --body <json>
agent-browser network unroute [url]
agent-browser network requests
agent-browser network requests --filter api
Tabs & Windows
agent-browser tab
agent-browser tab new [url]
agent-browser tab <n>
agent-browser tab close [n]
agent-browser window new
Frames
agent-browser frame <sel>
agent-browser frame main
Dialogs
agent-browser dialog accept [text]
agent-browser dialog dismiss
Debug
agent-browser trace start [path]
agent-browser trace stop [path]
agent-browser console
agent-browser console --clear
agent-browser errors
agent-browser errors --clear
agent-browser highlight <sel>
agent-browser state save <path>
agent-browser state load <path>
Navigation
agent-browser back
agent-browser forward
agent-browser reload
Setup
agent-browser install
agent-browser install --with-deps
Sessions
Run multiple isolated browser instances:
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
agent-browser session list
agent-browser session
Each session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
Persistent Profiles
By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use --profile to persist state across browser restarts:
agent-browser --profile ~/.myapp-profile open myapp.com
agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
The profile directory stores:
- Cookies and localStorage
- IndexedDB data
- Service workers
- Browser cache
- Login sessions
Tip: Use different profile paths for different projects to keep their browser state isolated.
Snapshot Options
The snapshot command supports filtering to reduce output size:
agent-browser snapshot
agent-browser snapshot -i
agent-browser snapshot -c
agent-browser snapshot -d 3
agent-browser snapshot -s "#main"
agent-browser snapshot -i -c -d 5
-i, --interactive | Only show interactive elements (buttons, links, inputs) |
-c, --compact | Remove empty structural elements |
-d, --depth <n> | Limit tree depth |
-s, --selector <sel> | Scope to CSS selector |
Options
--session <name> | Use isolated session (or AGENT_BROWSER_SESSION env) |
--profile <path> | Persistent browser profile directory (or AGENT_BROWSER_PROFILE env) |
--headers <json> | Set HTTP headers scoped to the URL's origin |
--executable-path <path> | Custom browser executable (or AGENT_BROWSER_EXECUTABLE_PATH env) |
--args <args> | Browser launch args, comma or newline separated (or AGENT_BROWSER_ARGS env) |
--user-agent <ua> | Custom User-Agent string (or AGENT_BROWSER_USER_AGENT env) |
--proxy <url> | Proxy server URL with optional auth (or AGENT_BROWSER_PROXY env) |
--proxy-bypass <hosts> | Hosts to bypass proxy (or AGENT_BROWSER_PROXY_BYPASS env) |
-p, --provider <name> | Cloud browser provider (or AGENT_BROWSER_PROVIDER env) |
--json | JSON output (for agents) |
--full, -f | Full page screenshot |
--name, -n | Locator name filter |
--exact | Exact text match |
--headed | Show browser window (not headless) |
--cdp <port> | Connect via Chrome DevTools Protocol |
--ignore-https-errors | Ignore HTTPS certificate errors (useful for self-signed certs) |
--debug | Debug output |
Selectors
Refs (Recommended for AI)
Refs provide deterministic element selection from snapshots:
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
agent-browser get text @e1
agent-browser hover @e4
Why use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
CSS Selectors
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"
Text & XPath
agent-browser click "text=Submit"
agent-browser click "xpath=//button"
Semantic Locators
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
Agent Mode
Use --json for machine-readable output:
agent-browser snapshot --json
agent-browser get text @e1 --json
agent-browser is visible @e2 --json
Optimal AI Workflow
agent-browser open example.com
agent-browser snapshot -i --json
agent-browser click @e2
agent-browser fill @e3 "input text"
agent-browser snapshot -i --json
Headed Mode
Show the browser window for debugging:
agent-browser open example.com --headed
This opens a visible browser window instead of running headless.
Authenticated Sessions
Use --headers to set HTTP headers for a specific origin, enabling authentication without login flows:
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
agent-browser snapshot -i --json
agent-browser click @e2
agent-browser open other-site.com
This is useful for:
- Skipping login flows - Authenticate via headers instead of UI
- Switching users - Start new sessions with different auth tokens
- API testing - Access protected endpoints directly
- Security - Headers are scoped to the origin, not leaked to other domains
To set headers for multiple origins, use --headers with each open command:
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
For global headers (all domains), use set headers:
agent-browser set headers '{"X-Custom-Header": "value"}'
Custom Browser Executable
Use a custom browser executable instead of the bundled Chromium. This is useful for:
- Serverless deployment: Use lightweight Chromium builds like
@sparticuz/chromium (~50MB vs ~684MB)
- System browsers: Use an existing Chrome/Chromium installation
- Custom builds: Use modified browser builds
CLI Usage
agent-browser --executable-path /path/to/chromium open example.com
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
Serverless Example (Vercel/AWS Lambda)
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
}
CDP Mode
Connect to an existing browser via Chrome DevTools Protocol:
agent-browser connect 9222
agent-browser snapshot
agent-browser tab
agent-browser close
agent-browser --cdp 9222 snapshot
agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
The --cdp flag accepts either:
- A port number (e.g.,
9222) for local connections via http://localhost:{port}
- A full WebSocket URL (e.g.,
wss://... or ws://...) for remote browser services
This enables control of:
- Electron apps
- Chrome/Chromium instances with remote debugging
- WebView2 applications
- Any browser exposing a CDP endpoint
Streaming (Browser Preview)
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
Enable Streaming
Set the AGENT_BROWSER_STREAM_PORT environment variable:
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
WebSocket Protocol
Connect to ws://localhost:9223 to receive frames and send input:
Receive frames:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}
Send mouse events:
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
Send keyboard events:
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
Send touch events:
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
Programmatic API
For advanced use, control streaming directly via the protocol:
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
await browser.startScreencast((frame) => {
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
}, {
format: 'jpeg',
quality: 80,
maxWidth: 1280,
maxHeight: 720,
});
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
});
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter',
});
await browser.stopScreencast();
Architecture
agent-browser uses a client-daemon architecture:
- Rust CLI (fast native binary) - Parses commands, communicates with daemon
- Node.js Daemon - Manages Playwright browser instance
- Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
Browser Engine: Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
Platforms
| macOS ARM64 | Native Rust | Node.js |
| macOS x64 | Native Rust | Node.js |
| Linux ARM64 | Native Rust | Node.js |
| Linux x64 | Native Rust | Node.js |
| Windows x64 | Native Rust | Node.js |
Usage with AI Agents
Just ask the agent
The simplest approach - just tell your agent to use it:
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
The --help output is comprehensive and most agents can figure it out from there.
AI Coding Assistants
Add the skill to your AI coding assistant for richer context:
npx skills add vercel-labs/agent-browser
This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.
AGENTS.md / CLAUDE.md
For more consistent results, add to your project or global instructions file:
## Browser Automation
Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
Core workflow:
1. `agent-browser open <url>` - Navigate to page
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
4. Re-snapshot after page changes
Integrations
Browserbase
Browserbase provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
To enable Browserbase, use the -p flag:
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
agent-browser -p browserbase open https://example.com
Or use environment variables for CI/scripts:
export AGENT_BROWSER_PROVIDER=browserbase
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
agent-browser open https://example.com
When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
Get your API key and project ID from the Browserbase Dashboard.
Browser Use
Browser Use provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
To enable Browser Use, use the -p flag:
export BROWSER_USE_API_KEY="your-api-key"
agent-browser -p browseruse open https://example.com
Or use environment variables for CI/scripts:
export AGENT_BROWSER_PROVIDER=browseruse
export BROWSER_USE_API_KEY="your-api-key"
agent-browser open https://example.com
When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
Get your API key from the Browser Use Cloud Dashboard. Free credits are available to get started, with pay-as-you-go pricing after.
Kernel
Kernel provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
To enable Kernel, use the -p flag:
export KERNEL_API_KEY="your-api-key"
agent-browser -p kernel open https://example.com
Or use environment variables for CI/scripts:
export AGENT_BROWSER_PROVIDER=kernel
export KERNEL_API_KEY="your-api-key"
agent-browser open https://example.com
Optional configuration via environment variables:
KERNEL_HEADLESS | Run browser in headless mode (true/false) | false |
KERNEL_STEALTH | Enable stealth mode to avoid bot detection (true/false) | true |
KERNEL_TIMEOUT_SECONDS | Session timeout in seconds | 300 |
KERNEL_PROFILE_NAME | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
Profile Persistence: When KERNEL_PROFILE_NAME is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
Get your API key from the Kernel Dashboard.
License
Apache-2.0