New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

agent-rdp

Package Overview
Dependencies
Maintainers
1
Versions
24
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

agent-rdp

CLI tool for AI agents to control Windows Remote Desktop sessions

latest
Source
npmnpm
Version
0.6.5
Version published
Maintainers
1
Created
Source

agent-rdp

A CLI tool for AI agents to control Windows Remote Desktop sessions, built on IronRDP.

Demo

Claude Code automating SQLite database and table creation via RDP:

https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421

Features

  • Connect to RDP servers - Full RDP protocol support with TLS and CredSSP authentication
  • Take screenshots - Capture the remote desktop as PNG or JPEG
  • Mouse control - Click, double-click, right-click, drag, scroll
  • Keyboard input - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.)
  • Clipboard sync - Copy/paste text between local machine and remote Windows
  • Drive mapping - Map local directories as network drives on the remote machine
  • UI Automation - Interact with Windows applications via accessibility API (click, select, toggle, expand)
  • OCR text location - Find text on screen using OCR when UI Automation isn't available
  • JSON output - Structured output for AI agent consumption
  • Session management - Multiple named sessions with automatic daemon lifecycle

Installation

From npm

npm install -g agent-rdp

As a Claude Code skill

npx add-skill https://github.com/thisnick/agent-rdp

From source

git clone https://github.com/thisnick/agent-rdp
cd agent-rdp
pnpm install
pnpm build      # Build native binary
pnpm build:ts   # Build TypeScript

Usage

Connect to an RDP Server

# Using command line (password visible in process list - not recommended)
agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret'

# Using environment variables (recommended)
export AGENT_RDP_USERNAME=Administrator
export AGENT_RDP_PASSWORD=secret
agent-rdp connect --host 192.168.1.100

# Using stdin (most secure)
echo 'secret' | agent-rdp connect --host 192.168.1.100 --username Administrator --password-stdin

Take a Screenshot

# Save to file
agent-rdp screenshot --output desktop.png

# Output as base64 (for AI agents)
agent-rdp screenshot --base64

# With JSON output
agent-rdp --json screenshot --base64

Mouse Operations

# Click at position
agent-rdp mouse click 500 300

# Right-click
agent-rdp mouse right-click 500 300

# Double-click
agent-rdp mouse double-click 500 300

# Move cursor
agent-rdp mouse move 100 200

# Drag from (100,100) to (500,500)
agent-rdp mouse drag 100 100 500 500

Keyboard Operations

# Type text (supports Unicode)
agent-rdp keyboard type "Hello, World!"

# Press key combinations
agent-rdp keyboard press "ctrl+c"
agent-rdp keyboard press "alt+tab"
agent-rdp keyboard press "ctrl+shift+esc"

# Press single keys (use press command)
agent-rdp keyboard press enter
agent-rdp keyboard press escape
agent-rdp keyboard press f5

Scroll

agent-rdp scroll up --amount 3
agent-rdp scroll down --amount 5
agent-rdp scroll left
agent-rdp scroll right

Locate (OCR)

Find text on screen using OCR (powered by ocrs). Useful when UI Automation can't access certain elements (WebView content, some dialogs).

# Find lines containing text
agent-rdp locate "Cancel"

# Pattern matching (glob-style)
agent-rdp locate "Save*" --pattern

# Get all text on screen
agent-rdp locate --all

# JSON output
agent-rdp locate "OK" --json

Returns text lines with coordinates for clicking:

Found 1 line(s) containing 'Cancel':
  'Cancel Button' at (650, 420) size 80x14 - center: (690, 427)

To click the first match: agent-rdp mouse click 690 427

Clipboard

# Set clipboard text (available when you paste on Windows)
agent-rdp clipboard set "Hello from CLI"

# Get clipboard text (after copying on Windows)
agent-rdp clipboard get

# With JSON output
agent-rdp --json clipboard get

Drive Mapping

Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified.

# Map local directories during connection
agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \
  --drive /home/user/documents:Documents \
  --drive /tmp/shared:Shared

# List mapped drives
agent-rdp drive list

On the remote Windows machine, mapped drives appear in File Explorer as network locations.

UI Automation

Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC.

For detailed documentation, see AUTOMATION.md.

# Connect with automation enabled
agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation

# Take an accessibility tree snapshot (refs are always included)
agent-rdp automate snapshot

# Snapshot filtering options (like agent-browser)
agent-rdp automate snapshot -i              # Interactive elements only
agent-rdp automate snapshot -c              # Compact (remove empty structural elements)
agent-rdp automate snapshot -d 3            # Limit depth to 3 levels
agent-rdp automate snapshot -s "~*Notepad*" # Scope to a window/element
agent-rdp automate snapshot -i -c -d 5      # Combine options

# Pattern-based element operations (refs use @eN format)
agent-rdp automate click "#SaveButton"     # Click button
agent-rdp automate click "@e5"             # Click by ref number from snapshot
agent-rdp automate click "@e5" -d          # Double-click (for file list items)
agent-rdp automate select "@e10"           # Select item (SelectionItemPattern)
agent-rdp automate toggle "@e7"            # Toggle checkbox (TogglePattern)
agent-rdp automate expand "@e3"            # Expand menu (ExpandCollapsePattern)
agent-rdp automate context-menu "@e5"      # Open context menu (Shift+F10)

# Fill text fields
agent-rdp automate fill ".Edit" "Hello World"

# Window operations
agent-rdp automate window list
agent-rdp automate window focus "~*Notepad*"

# Run PowerShell commands
agent-rdp automate run "Get-Process" --wait
agent-rdp automate run "Get-Process" --wait --process-timeout 5000  # With 5s timeout

Selector Types:

  • @e5 or @5 - Reference number from snapshot (e prefix recommended)
  • #SaveButton - Automation ID
  • .Edit - Win32 class name
  • ~*pattern* - Wildcard name match
  • File - Element name (exact match)

Snapshot Output Format:

- Window "Notepad" [ref=e1, id=Notepad]
  - MenuBar "Application" [ref=e2]
    - MenuItem "File" [ref=e3]
  - Edit "Text Editor" [ref=e5, value="Hello"]

Session Management

# List active sessions
agent-rdp session list

# Get current session info
agent-rdp session info

# Close a session
agent-rdp session close

# Use a named session
agent-rdp --session work connect --host work-pc.local ...
agent-rdp --session work screenshot

Disconnect

agent-rdp disconnect

Web Viewer

Open the web-based viewer to see the remote desktop in your browser:

# Open viewer (connects to default streaming port 9224)
agent-rdp view

# Specify a different port
agent-rdp view --port 9224

The viewer requires WebSocket streaming to be enabled. Start a session with streaming:

agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret
agent-rdp view

JSON Output

All commands support --json for structured output:

agent-rdp --json screenshot --base64

Success response:

{
  "success": true,
  "data": {
    "type": "screenshot",
    "width": 1920,
    "height": 1080,
    "format": "png",
    "base64": "iVBORw0KGgo..."
  }
}

Error response:

{
  "success": false,
  "error": {
    "code": "not_connected",
    "message": "Not connected to an RDP server"
  }
}

Environment Variables

VariableDescription
AGENT_RDP_HOSTRDP server hostname or IP
AGENT_RDP_PORTRDP server port (default: 3389)
AGENT_RDP_USERNAMERDP username
AGENT_RDP_PASSWORDRDP password
AGENT_RDP_SESSIONSession name (default: "default")
AGENT_RDP_STREAM_PORTWebSocket streaming port (0 = disabled)

Node.js API

Use agent-rdp programmatically from Node.js/TypeScript:

import { RdpSession } from 'agent-rdp';

const rdp = new RdpSession({ session: 'default' });

await rdp.connect({
  host: '192.168.1.100',
  username: 'Administrator',
  password: 'secret',
  width: 1280,
  height: 800,
  drives: [{ path: '/tmp/share', name: 'Share' }],
  enableWinAutomation: true,  // Enable UI Automation
});

// Screenshot
const { base64, width, height } = await rdp.screenshot({ format: 'png' });

// Mouse
await rdp.mouse.click({ x: 100, y: 200 });
await rdp.mouse.rightClick({ x: 100, y: 200 });
await rdp.mouse.doubleClick({ x: 100, y: 200 });
await rdp.mouse.move({ x: 150, y: 250 });
await rdp.mouse.drag({ from: { x: 100, y: 100 }, to: { x: 500, y: 500 } });

// Keyboard
await rdp.keyboard.type({ text: 'Hello World' });
await rdp.keyboard.press({ keys: 'ctrl+c' });
await rdp.keyboard.press({ keys: 'enter' });  // Single keys use press()

// Scroll
await rdp.scroll.up();                    // Default amount: 3
await rdp.scroll.down({ amount: 5 });     // Custom amount
await rdp.scroll.up({ x: 500, y: 300 });  // Scroll at position

// Clipboard
await rdp.clipboard.set({ text: 'text to copy' });
const text = await rdp.clipboard.get();

// Locate text using OCR
const matches = await rdp.locate({ text: 'Cancel' });
if (matches.length > 0) {
  await rdp.mouse.click({ x: matches[0].center_x, y: matches[0].center_y });
}

// Get all text on screen
const allText = await rdp.locate({ all: true });

// Automation (requires --enable-win-automation at connect)
const snapshot = await rdp.automation.snapshot({ interactive: true });
await rdp.automation.click('@e5');           // Click button by ref
await rdp.automation.click('@e5', { doubleClick: true }); // Double-click
await rdp.automation.select('@e10');         // Select item
await rdp.automation.toggle('@e7');          // Toggle checkbox
await rdp.automation.expand('@e3');          // Expand menu
await rdp.automation.contextMenu('@e5');     // Open context menu
await rdp.automation.fill('#input', 'text'); // Fill text field
await rdp.automation.run('notepad.exe');     // Run command
await rdp.automation.waitFor('#SaveButton', { timeout: 5000 });

// Window management
const windows = await rdp.automation.listWindows();
await rdp.automation.focusWindow('~*Notepad*');
await rdp.automation.maximizeWindow();

// Drives
const drives = await rdp.drives.list();

// Session info
const info = await rdp.getInfo();

// Disconnect
await rdp.disconnect();

WebSocket Streaming

Enable WebSocket streaming for real-time screen capture and bidirectional clipboard support:

const rdp = new RdpSession({
  session: 'viewer',
  streamPort: 9224,  // Enable streaming
});

await rdp.connect({...});

// Connect your WebSocket client to receive JPEG frames
const streamUrl = rdp.getStreamUrl(); // "ws://localhost:9224"

For the complete WebSocket protocol specification (message types, clipboard flow, input handling), see WEBSOCKET.md.

Architecture

agent-rdp uses a daemon-per-session architecture:

  • CLI (agent-rdp) - Parses commands and communicates with the daemon
  • Daemon - Maintains the RDP connection and processes commands
  • IPC - Unix sockets (macOS/Linux) or TCP (Windows)

The daemon is automatically started on the first command and persists until explicitly closed or the session times out.

Limitations

UI Automation

  • WebViews: UI Automation cannot interact with WebView content (e.g., Windows Start menu search, Edge browser content, Electron apps). Use Win+R or automate run to launch programs directly instead of clicking through menus.
  • UAC Dialogs: User Account Control elevation prompts run on a secure desktop and are not accessible via UI Automation. There is no good workaround - the remote user must interact with UAC manually, or UAC must be disabled (not recommended for security reasons).

OCR Fallback

When UI Automation cannot access certain elements, the locate command provides OCR-based text detection:

agent-rdp locate "Button Text"    # Find text and get coordinates
agent-rdp mouse click <x> <y>     # Click at returned coordinates

This is not highly reliable (OCR can misread characters, miss text, or return imprecise coordinates), but may work for simple cases like dialog buttons.

Screenshot Coordinate Detection

Claude models (in non-computer-use mode, such as Claude Code) are poor at estimating pixel coordinates from screenshots. Do not ask Claude to look at a screenshot and guess where to click - it will likely be inaccurate.

Gemini models are generally good at pixel coordinate estimation from images.

If you need vision-based coordinate detection with Claude, implement your own harness using Claude's Computer Use Tool which is specifically designed for this purpose.

Requirements

  • Rust 1.75 or later
  • Target RDP server with Network Level Authentication (NLA) enabled

License

MIT OR Apache-2.0 (same as IronRDP)

Keywords

rdp

FAQs

Package last updated on 26 Feb 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts