Socket
Book a DemoInstallSign in
Socket

n8n-nodes-scan-docx

Package Overview
Dependencies
Maintainers
0
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

n8n-nodes-scan-docx

The Auto Scan DOCX and Image node scans DOCX and image files, extracts text via OCR, and classifies the content. Notifications can be sent after processing.

latest
Source
npmnpm
Version
0.1.0
Version published
Weekly downloads
5
Maintainers
0
Weekly downloads
 
Created
Source

Auto Scan DOCX and Image Node for n8n

The Auto Scan DOCX and Image node for n8n allows you to automatically scan DOCX or image files (using Optical Character Recognition, OCR), extract relevant data, and classify the contents. This node supports both DOCX file processing and image-based OCR scanning, providing flexibility for document automation workflows.

Features:

  • OCR for Image Files: Automatically scan and extract text from image files using OCR.
  • DOCX Extraction: Extract text content from DOCX files for further processing.
  • Document Classification: Classify extracted content into predefined categories such as department and priority.
  • Notification Support: Optionally send notifications (e.g., via email, SMS, or print) once processing is complete.

Installation

To install this custom node, follow these steps:

  • Clone or download this repository.
  • Follow the official n8n custom node installation guide.
  • Install the necessary dependencies for OCR and DOCX extraction:
    • For OCR: Ensure Tesseract.js is installed and properly configured.
    • For DOCX extraction: Install Mammoth.js.

Configuration

Node Settings

Once you add the Auto Scan DOCX and Image node to your workflow, you can configure the following parameters:

  • Input Type (options):

    • Choose between Image (for OCR) or DOCX (for document extraction).
    • Default: docx
    • Description: Select the input type for scanning (Image or DOCX file).
  • File URL or Path (string):

    • Provide the URL or local file path to the document or image file you wish to process.
    • Default: ``
    • Description: The path or URL to the file.
  • Language (options):

    • English (eng) or Vietnamese (vie).
    • Default: eng
    • Description: The language to use for OCR processing on image files.
  • Send Notification (boolean):

    • Determines whether to send a notification after the document processing is complete.
    • Default: false
    • Description: If set to true, a notification will be sent after processing is finished.
  • Output Format (options):

    • Choose between JSON or Plain Text for the output format of the extracted data.
    • Default: json
    • Description: Choose the output format for the extracted data.
  • Department Routing (boolean):

    • Automatically route the document to the correct department based on the extracted content.
    • Default: true
    • Description: If set to true, the node will classify the document and route it to the appropriate department.
  • Notification Method (options):

    • Choose the method to notify users or departments about the document status after processing.
    • Options: Email, SMS, or Print.
    • Default: email
    • Description: Select the notification method for alerting users or departments.

Example Workflow

Input Data:

{
  "documentUrl": "https://example.com/document.docx",
  "documentType": "docx",
  "outputFormat": "json",
  "departmentRouting": true,
  "notificationMethod": "email"
}

Node Configuration:

Output Data (Example):

If the document is a DOCX file, the output might look like this:

{
  "extractedText": "This document contains financial data that needs to be routed to the finance department.",
  "classifiedData": {
    "department": "Finance",
    "priority": "High",
    "summary": "Extracted key financial information."
  },
  "notification": "Notification sent via email."
}

In this example:

  • extractedText: The raw text extracted from the document or image.
  • classifiedData: A summary of the classification (e.g., department, priority).
  • notification: A message indicating that a notification was sent.

Keywords

n8n-community-node-package

FAQs

Package last updated on 08 Dec 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts