You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

ocr-click-plugin

Package Overview
Dependencies
Maintainers
1
Versions
10
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocr-click-plugin

An Appium plugin that uses OCR (Optical Character Recognition) to find and click text elements on mobile device screens

2.1.6
Source
npmnpm
Version published
Weekly downloads
17
-60.47%
Maintainers
1
Weekly downloads
 
Created
Source

OCR Click Plugin

An Appium plugin that uses OCR (Optical Character Recognition) to find and click text elements on mobile device screens. This plugin leverages Tesseract.js for text recognition and Sharp for image enhancement to provide accurate and consistent text detection.

Features

  • 🔍 Advanced OCR: Uses Tesseract.js with optimized configuration for mobile screens
  • 🖼️ Image Enhancement: Preprocessing with Sharp for better text recognition
  • 🎯 Confidence Filtering: Only considers text matches above configurable confidence threshold
  • 📱 Cross-Platform: Works with both iOS (XCUITest) and Android (UiAutomator2) drivers
  • 🔧 Configurable: Customizable OCR parameters and image processing options
  • 📊 Detailed Logging: Progress tracking and confidence scores for debugging

Installation

Prerequisites

  • Node.js 14+
  • Appium 2.x
  • iOS/Android drivers installed

Install the Plugin

# Clone the repository
git clone <your-repo-url>
cd ocr-click-plugin

# Install dependencies
npm install

# Build the plugin
npm run build

# Install plugin to Appium
npm run install-plugin

Development Setup

# Run development server (uninstall, build, install, and start server)
npm run dev

# Or run individual commands
npm run build
npm run reinstall-plugin
npm run run-server

Usage

Starting the Server

npm run dev

This will start Appium server on http://localhost:4723/wd/hub with the OCR click plugin active.

API Endpoint

POST /session/{sessionId}/appium/plugin/textclick

Parameters

ParameterTypeRequiredDefaultDescription
textstringYes-Text to search for and click
indexnumberNo0Index of match to click (if multiple matches found)

Response

{
  "success": true,
  "message": "Clicked on text 'Login' at index 0",
  "totalMatches": 2,
  "confidence": 87.5
}

Examples

Using with WebDriver Client

// JavaScript example
const { remote } = require('webdriverio');

const driver = await remote({
  hostname: 'localhost',
  port: 4723,
  path: '/wd/hub',
  capabilities: {
    platformName: 'Android', // or 'iOS'
    automationName: 'UiAutomator2', // or 'XCUITest'
    deviceName: 'Your Device',
    app: '/path/to/your/app.apk'
  }
});

// Click on "Login" button
await driver.url(`/session/${driver.sessionId}/appium/plugin/textclick`);
const result = await driver.execute('POST', '', {
  text: 'Login',
  index: 0
});

console.log(result); // { success: true, message: "Clicked on text 'Login' at index 0", ... }

Using with cURL

# First create a session, then use the session ID
curl -X POST http://localhost:4723/wd/hub/session/{sessionId}/appium/plugin/textclick \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Sign Up",
    "index": 0
  }'

Python Example

from appium import webdriver
import requests

# Create Appium session
driver = webdriver.Remote(
    'http://localhost:4723/wd/hub',
    {
        'platformName': 'Android',
        'automationName': 'UiAutomator2',
        'deviceName': 'Your Device',
        'app': '/path/to/your/app.apk'
    }
)

# Use the OCR click plugin
session_id = driver.session_id
response = requests.post(
    f'http://localhost:4723/wd/hub/session/{session_id}/appium/plugin/textclick',
    json={
        'text': 'Submit',
        'index': 0
    }
)

result = response.json()
print(f"Clicked with confidence: {result['confidence']}%")

Configuration

OCR Settings

The plugin uses optimized Tesseract configuration:

const TESSERACT_CONFIG = {
  lang: 'eng',
  tessedit_char_whitelist: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 .,!?-_@#$%^&*()',
  tessedit_pageseg_mode: '6', // Uniform text block
  preserve_interword_spaces: '1',
  // ... other optimizations
};

Confidence Threshold

Default minimum confidence threshold is 60%. Words below this confidence are filtered out:

const MIN_CONFIDENCE_THRESHOLD = 60;

Image Enhancement

The plugin applies several image processing steps:

  • Grayscale conversion - Reduces noise
  • Normalization - Enhances contrast
  • Sharpening - Improves text clarity
  • Gamma correction - Better text contrast
  • Median filtering - Removes noise
  • Binary thresholding - Clear text separation

Troubleshooting

Sharp Installation Issues

If you encounter Sharp compilation errors during installation, especially with Node.js v24+:

# Method 1: Use environment variable
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install ocr-click-plugin

# Method 2: Install Sharp separately first
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --include=optional sharp
npm install ocr-click-plugin

# Method 3: For Appium plugin installation
SHARP_IGNORE_GLOBAL_LIBVIPS=1 appium plugin install ocr-click-plugin

Text Not Found

  • Check confidence threshold: Lower MIN_CONFIDENCE_THRESHOLD if text is not being detected
  • Verify text spelling: Ensure exact text match (case-insensitive)
  • Check image quality: Poor screenshots may affect OCR accuracy

Inconsistent Results

  • Image enhancement: The plugin includes advanced preprocessing to improve consistency
  • Confidence filtering: Only high-confidence matches are considered
  • Character whitelist: Limits recognition to expected characters

Performance Issues

  • Reduce image size: Large screenshots take longer to process
  • Optimize configuration: Adjust Tesseract parameters for your use case
  • Check device performance: Ensure adequate resources

Development

Project Structure

ocr-click-plugin/
├── src/
│   └── index.ts          # Main plugin implementation
├── dist/                 # Compiled JavaScript
├── package.json          # Dependencies and scripts
├── tsconfig.json         # TypeScript configuration
└── README.md            # This file

Building

npm run build

Testing

npm test

Available Scripts

npm run dev          # Full development workflow
npm run build        # Compile TypeScript
npm run install-plugin    # Install to Appium
npm run reinstall-plugin  # Uninstall and reinstall
npm run run-server   # Start Appium server
npm run uninstall    # Remove from Appium

Technical Details

Dependencies

  • @appium/base-plugin: Appium plugin framework
  • tesseract.js: OCR engine
  • sharp: Image processing
  • typescript: Development language

Supported Platforms

  • ✅ Android (UiAutomator2)
  • ✅ iOS (XCUITest)

Image Processing Pipeline

  • Capture screenshot via Appium driver
  • Convert to grayscale for better OCR
  • Apply normalization and sharpening
  • Gamma correction for text contrast
  • Noise reduction with median filter
  • Binary threshold for clear text separation
  • OCR recognition with Tesseract
  • Confidence filtering and text matching
  • Coordinate calculation and click action

Contributing

  • Fork the repository
  • Create your feature branch (git checkout -b feature/amazing-feature)
  • Commit your changes (git commit -m 'Add some amazing feature')
  • Push to the branch (git push origin feature/amazing-feature)
  • Open a Pull Request

License

This project is licensed under the ISC License - see the LICENSE file for details.

Changelog

Version 1.0.0

  • Initial release with OCR text detection and clicking
  • Advanced image preprocessing for better accuracy
  • Confidence-based filtering for consistent results
  • Support for multiple text matches with index selection
  • Comprehensive logging and error handling

Keywords

appium

FAQs

Package last updated on 23 Jun 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts