
Security News
Frontier AI Is Now Critical Infrastructure
The Fable shutdown shows how quickly model access can become a business continuity risk for AI-dependent engineering teams.
Vision-based desktop automation MCP server. Control any application via screenshot + AI vision.
![]() macOS |
![]() Windows |
|
Finder File management |
Explorer File operations |
System Settings macOS & Windows |
![]() Chrome 200-300+ elements |
![]() Brave Full CDP support |
Edge, Arc, Opera Chromium-based |
Note: Chrome 136+ requires automatic profile sync (~20-30s) due to CDP security changes.
"If you can see it, OScribe can click it."
OScribe is your fallback when traditional automation tools fail:
Claude plays through the entire first chapter of Helltaker using OScribe MCP tools - navigating menus, solving puzzles, and progressing through dialogue, all via screenshot + vision.
Run our interactive installer that checks and installs all prerequisites for you:
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs | node
# Windows (PowerShell as Administrator)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/install.mjs -OutFile install.mjs; node install.mjs
The installer will:
If you prefer manual installation or already have prerequisites:
npm install -g oscribe
Then configure your MCP client (see MCP Integration below).
OScribe uses robotjs for native mouse/keyboard control, which requires compilation tools:
Node.js 22+ - Download
Python 3.x - Download (check "Add to PATH" during install)
Visual Studio Build Tools - Install with C++ workload:
# Option 1: Via npm (recommended)
npm install -g windows-build-tools
# Option 2: Manual install
# Download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Select "Desktop development with C++" workload
Node.js 22+ - Download or brew install node
Xcode Command Line Tools:
xcode-select --install
Python 3.x - Usually pre-installed, verify with python3 --version
Before installing, run the diagnostic script to check all prerequisites:
# macOS/Linux - Run directly without installation
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs
The doctor script checks:
It provides step-by-step fix instructions for any missing prerequisites.
After OScribe is installed, you can also run:
oscribe doctor
# Global installation
npm install -g oscribe
# Verify installation
oscribe --version
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
npm run build
npm link # Makes 'oscribe' command available globally
| Platform | Status |
|---|---|
| Windows | ✅ Fully supported |
| macOS | ✅ Supported |
| Linux | 🚧 Not tested yet |
ax-reader binary)oscribe click "Submit button" # Click by description - the magic!
oscribe click "File menu" # Works on any visible element
oscribe click "Export as PNG" --screen 1 # Target specific monitor
oscribe click "Close" --dry-run # Preview without clicking
oscribe type "hello world" # Type text
oscribe hotkey "ctrl+c" # Press keyboard shortcut
oscribe hotkey "ctrl+shift+esc" # Multiple modifiers
oscribe screenshot # Capture primary screen
oscribe screenshot -o capture.png # Save to file
oscribe screenshot --screen 1 # Capture second monitor
oscribe screenshot --list # List available screens
oscribe screenshot --describe # Describe screen content with AI
oscribe windows # List open windows
oscribe focus "Chrome" # Focus window by name
oscribe focus "Calculator" # Works with partial matches
oscribe serve # Start MCP server (stdio transport)
--verbose, -v # Detailed output
--dry-run # Simulate without executing
--quiet, -q # Minimal output
--screen N # Target specific screen (default: 0)
# Take screenshot and save
oscribe screenshot -o desktop.png
# Type with delay between keystrokes
oscribe type "slow typing" --delay 100
# Use second monitor
oscribe screenshot --screen 1 --describe
# Dry run to see what would happen
oscribe type "test" --dry-run
OScribe exposes tools via Model Context Protocol for AI agents. Works with Claude Desktop, Claude Code, Cursor, Windsurf, and any MCP-compatible client.
Edit your config file:
| OS | Config Path |
|---|---|
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
Add OScribe to mcpServers:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}
Or if installed globally (npm install -g oscribe):
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}
Then restart Claude Desktop. You'll see a 🔌 icon indicating MCP tools are available.
Add a .mcp.json file in your project root:
{
"mcpServers": {
"oscribe": {
"command": "npx",
"args": ["-y", "oscribe", "serve"]
}
}
}
Or if installed globally:
{
"mcpServers": {
"oscribe": {
"command": "oscribe",
"args": ["serve"]
}
}
}
| Tool | Description | Parameters |
|---|---|---|
os_screenshot | 📸 Capture screenshot + cursor position | screen? (default: 0) |
os_inspect | 🔍 Get UI elements via Windows UI Automation | window? |
os_inspect_at | 🎯 Get element info at coordinates | x, y |
os_move | Move mouse cursor | x, y |
os_click | Click at current cursor position | window?, button? |
os_click_at | Move + click in one action | x, y, window?, button? |
os_type | Type text | text |
os_hotkey | Press keyboard shortcut | keys (e.g., "ctrl+c") |
os_scroll | Scroll in direction | direction, amount? |
os_windows | List open windows + screens | - |
os_focus | Focus window by name | window |
os_wait | Wait for duration (UI loading) | ms (max 30000) |
os_nvda_status | Check NVDA screen reader status (Electron support) | - |
os_nvda_install | Download NVDA portable for Electron apps | - |
os_nvda_start | Start NVDA in silent mode | - |
os_nvda_stop | Stop NVDA screen reader | - |
Once configured, Claude can automate your desktop:
"Take a screenshot and describe what you see"
"Inspect the UI elements and click the Submit button"
"List all windows and focus on Chrome"
"Type 'hello world' and press Ctrl+Enter"
Workflow: Claude uses os_screenshot to see the screen, os_inspect to get element coordinates, then os_move + os_click for precise interaction.
Config directory: ~/.oscribe/
config.json - Application settings{
"defaultScreen": 0,
"dryRun": false,
"logLevel": "info",
"cursorSize": 128
}
| Option | Type | Default | Description |
|---|---|---|---|
defaultScreen | number | 0 | Default monitor to capture |
dryRun | boolean | false | Simulate actions without executing |
logLevel | string | "info" | Log level: debug, info, warn, error |
cursorSize | number | 128 | Cursor size in screenshots (32-256) |
nvda.autoDownload | boolean | false | Auto-download NVDA when needed |
nvda.autoStart | boolean | true | Auto-start NVDA for Electron apps |
nvda.customPath | string | - | Custom NVDA installation path |
OScribe uses a multi-layer approach for desktop automation (Windows):
Screenshot Layer - Captures screen using PowerShell + .NET System.Drawing
UI Automation Layer - Gets element coordinates via Windows accessibility tree:
Input Layer - Uses robotjs for:
Best strategy: Use os_screenshot which returns UI elements with coordinates, then os_move + os_click for precise interaction.
git clone https://github.com/mikealkeal/oscribe.git
cd oscribe
npm install
npm run build # Build TypeScript
npm run dev # Development mode (watch)
npm run typecheck # Type check only
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run format # Format with Prettier
npm run clean # Remove dist folder
oscribe/
├── bin/
│ └── oscribe.ts # CLI entry point
├── src/
│ ├── core/
│ │ ├── screenshot.ts # Multi-platform screen capture
│ │ ├── input.ts # Mouse/keyboard control (robotjs)
│ │ ├── windows.ts # Window management
│ │ └── uiautomation.ts # Windows UI Automation (accessibility)
│ ├── cli/
│ │ ├── commands/ # CLI command implementations
│ │ └── index.ts # Command registration
│ ├── mcp/
│ │ └── server.ts # MCP server (12 tools)
│ ├── config/
│ │ └── index.ts # Config management with Zod
│ └── index.ts # Main exports
├── package.json
├── tsconfig.json
├── .env.example
└── LICENSE
npm install fails with node-gyp errors:
First, run the diagnostic script (no installation required):
# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs | node
# Windows (PowerShell)
irm https://raw.githubusercontent.com/mikealkeal/oscribe/main/scripts/doctor.mjs -OutFile doctor.mjs; node doctor.mjs
This is usually due to missing build tools. robotjs requires native compilation.
# Error examples:
# - "gyp ERR! find Python"
# - "gyp ERR! find VS"
# - "node-pre-gyp ERR! build error"
Windows fix:
# 1. Install Python (if missing)
# Download from https://www.python.org/downloads/
# IMPORTANT: Check "Add Python to PATH" during installation
# 2. Install Visual Studio Build Tools
npm install -g windows-build-tools
# Or manually: download from https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Select "Desktop development with C++" workload
# 3. Retry installation
npm install -g oscribe
macOS fix:
# 1. Install Xcode Command Line Tools
xcode-select --install
# 2. Retry installation
npm install -g oscribe
Still failing? Try clearing npm cache:
npm cache clean --force
npm install -g oscribe
Server not starting:
node --version (requires 22+)npm run buildTools not appearing in Claude Desktop:
claude_desktop_config.json syntax (valid JSON)Clicks not working:
UI elements not detected:
os_screenshot to see what's visibleElectron apps showing few UI elements:
Electron/Chromium apps require NVDA screen reader to expose their full accessibility tree:
# Install NVDA portable (one-time)
oscribe nvda install
# Start NVDA silently (no audio)
oscribe nvda start
Or via MCP tools: os_nvda_install → os_nvda_start
NVDA runs in silent mode (no speech, no sounds). The agent will prompt to install NVDA when needed.
Manual NVDA installation:
If you prefer to install NVDA yourself, download from nvaccess.org and set the path in config:
{
"nvda": {
"customPath": "C:/Program Files/NVDA"
}
}
BSL 1.1 (Business Source License 1.1)
See LICENSE for full terms.
Contributions are welcome! Please feel free to submit a Pull Request.
npm run build succeedsnpm run typecheckOScribe is built on top of these great open-source projects:
Maintained by Mickaël Bellun
FAQs
Vision-based desktop automation engine
We found that oscribe demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Fable shutdown shows how quickly model access can become a business continuity risk for AI-dependent engineering teams.

Security News
AI agents are pulling packages into environments no scanner is watching, creating exposure before security teams can see it.

Security News
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.