
Security News
npm ‘is’ Package Hijacked in Expanding Supply Chain Attack
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.
ocr-click-plugin
Advanced tools
An Appium plugin that uses OCR (Optical Character Recognition) to find and click text elements on mobile device screens
An Appium plugin that uses OCR (Optical Character Recognition) to find and click text elements on mobile device screens. This plugin leverages Tesseract.js for text recognition and Sharp for image enhancement to provide accurate and consistent text detection.
# Clone the repository
git clone <your-repo-url>
cd ocr-click-plugin
# Install dependencies
npm install
# Build the plugin
npm run build
# Install plugin to Appium
npm run install-plugin
# Run development server (uninstall, build, install, and start server)
npm run dev
# Or run individual commands
npm run build
npm run reinstall-plugin
npm run run-server
npm run dev
This will start Appium server on http://localhost:4723/wd/hub
with the OCR click plugin active.
POST /session/{sessionId}/appium/plugin/textclick
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
text | string | Yes | - | Text to search for and click |
index | number | No | 0 | Index of match to click (if multiple matches found) |
{
"success": true,
"message": "Clicked on text 'Login' at index 0",
"totalMatches": 2,
"confidence": 87.5
}
// JavaScript example
const { remote } = require('webdriverio');
const driver = await remote({
hostname: 'localhost',
port: 4723,
path: '/wd/hub',
capabilities: {
platformName: 'Android', // or 'iOS'
automationName: 'UiAutomator2', // or 'XCUITest'
deviceName: 'Your Device',
app: '/path/to/your/app.apk'
}
});
// Click on "Login" button
await driver.url(`/session/${driver.sessionId}/appium/plugin/textclick`);
const result = await driver.execute('POST', '', {
text: 'Login',
index: 0
});
console.log(result); // { success: true, message: "Clicked on text 'Login' at index 0", ... }
# First create a session, then use the session ID
curl -X POST http://localhost:4723/wd/hub/session/{sessionId}/appium/plugin/textclick \
-H "Content-Type: application/json" \
-d '{
"text": "Sign Up",
"index": 0
}'
from appium import webdriver
import requests
# Create Appium session
driver = webdriver.Remote(
'http://localhost:4723/wd/hub',
{
'platformName': 'Android',
'automationName': 'UiAutomator2',
'deviceName': 'Your Device',
'app': '/path/to/your/app.apk'
}
)
# Use the OCR click plugin
session_id = driver.session_id
response = requests.post(
f'http://localhost:4723/wd/hub/session/{session_id}/appium/plugin/textclick',
json={
'text': 'Submit',
'index': 0
}
)
result = response.json()
print(f"Clicked with confidence: {result['confidence']}%")
The plugin uses optimized Tesseract configuration:
const TESSERACT_CONFIG = {
lang: 'eng',
tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 .,!?-_@#$%^&*()',
tessedit_pageseg_mode: '6', // Uniform text block
preserve_interword_spaces: '1',
// ... other optimizations
};
Default minimum confidence threshold is 60%. Words below this confidence are filtered out:
const MIN_CONFIDENCE_THRESHOLD = 60;
The plugin applies several image processing steps:
If you encounter Sharp compilation errors during installation, especially with Node.js v24+:
# Method 1: Use environment variable
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install ocr-click-plugin
# Method 2: Install Sharp separately first
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --include=optional sharp
npm install ocr-click-plugin
# Method 3: For Appium plugin installation
SHARP_IGNORE_GLOBAL_LIBVIPS=1 appium plugin install ocr-click-plugin
MIN_CONFIDENCE_THRESHOLD
if text is not being detectedocr-click-plugin/
├── src/
│ └── index.ts # Main plugin implementation
├── dist/ # Compiled JavaScript
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This file
npm run build
npm test
npm run dev # Full development workflow
npm run build # Compile TypeScript
npm run install-plugin # Install to Appium
npm run reinstall-plugin # Uninstall and reinstall
npm run run-server # Start Appium server
npm run uninstall # Remove from Appium
git checkout -b feature/amazing-feature
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the ISC License - see the LICENSE file for details.
FAQs
An Appium plugin that uses OCR (Optical Character Recognition) to find and click text elements on mobile device screens with AI-powered screen analysis
The npm package ocr-click-plugin receives a total of 14 weekly downloads. As such, ocr-click-plugin popularity was classified as not popular.
We found that ocr-click-plugin demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.
Security News
A critical flaw in the popular npm form-data package could allow HTTP parameter pollution, affecting millions of projects until patched versions are adopted.
Security News
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.