Socket
Book a DemoInstallSign in
Socket

n8n-nodes-n8ntools-document-processor

Package Overview
Dependencies
Maintainers
1
Versions
75
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

n8n-nodes-n8ntools-document-processor

N8N Tools - Document Processor: Process and analyze documents with OCR, text extraction, and format conversion

latest
Source
npmnpm
Version
4.4.2
Version published
Weekly downloads
51
-61.36%
Maintainers
1
Weekly downloads
 
Created
Source

N8N Tools - Document Processor

npm version npm downloads License: MIT

Process and analyze documents with OCR, text extraction, and format conversion capabilities. This N8N community node provides comprehensive document processing through the N8N Tools platform.

✨ Features

  • 📄 Text Extraction: Extract text from various document formats
  • 🔍 OCR Processing: Extract text from images and scanned documents
  • 🔄 Format Conversion: Convert between PDF, DOCX, TXT, HTML, MD, RTF
  • 📊 Metadata Extraction: Get document properties and information
  • ✂️ Page Splitting: Split documents into individual pages
  • 🔗 Document Merging: Combine multiple documents
  • 🌍 Multi-language OCR: Support for Portuguese, English, Spanish, French, German
  • 💰 Cost Tracking: Usage monitoring and budget controls

🚀 Quick Start

Installation

Install this node in your N8N instance:

  • Go to Settings > Community Nodes in your N8N interface
  • Click Install a community node
  • Enter n8n-nodes-n8ntools-document-processor
  • Click Install

Via npm

npm install n8n-nodes-n8ntools-document-processor

Setup Credentials

  • Sign up at N8N Tools and get your API key
  • In N8N, create new N8N Tools API credentials
  • Enter your API URL: https://api.n8ntools.io
  • Enter your API key

📖 Usage

Supported Operations

OperationDescriptionInputOutput
Extract TextExtract text contentPDF, DOCX, DOC, RTFPlain text
Extract MetadataGet document propertiesAny documentJSON metadata
Convert FormatChange document formatVarious formatsPDF, DOCX, TXT, HTML, MD, RTF
Split PagesSplit into individual pagesPDF, DOCXZIP with pages
Merge DocumentsCombine multiple documentsMultiple filesSingle document
OCR ProcessingExtract text from imagesPDF, imagesText with OCR

Example Workflow

[File Trigger] → [N8N Tools Document Processor] → [Extract Data] → [Database/Email]

Configuration Example

Invoice Text Extraction:

{
  "operation": "extractText",
  "inputSource": "binaryData",
  "binaryPropertyName": "data",
  "advancedOptions": {
    "extractImages": true,
    "extractTables": true,
    "preserveFormatting": true
  }
}

⚙️ Node Parameters

Input Configuration

  • Input Source: Binary Data, File URL, or Base64
  • Binary Property: Name of binary property (default: "data")
  • File URL: Direct URL to document file
  • Base64 Data: Base64 encoded document content

Operation-Specific Options

Format Conversion

  • Target Format: PDF, DOCX, TXT, HTML, MD, RTF

Page Splitting

  • Page Range: Specific pages (e.g., "1-5") or "all"

OCR Processing

  • Language: Portuguese, English, Spanish, French, German, Auto-detect

Advanced Options

  • Extract Images: Include images from document
  • Extract Tables: Parse table data
  • Preserve Formatting: Maintain original formatting
  • Password: For password-protected documents

📤 Output Data

Text Extraction Result

{
  "text": "This is the extracted text content...",
  "wordCount": 1250,
  "pageCount": 3,
  "hasImages": true,
  "hasTables": true,
  "images": [
    {
      "page": 1,
      "base64": "iVBORw0KGgoAAAANSUhEUgAA...",
      "format": "png"
    }
  ],
  "tables": [
    {
      "page": 2,
      "rows": 5,
      "columns": 3,
      "data": [["Header1", "Header2", "Header3"], ...]
    }
  ],
  "success": true,
  "operation": "extractText",
  "creditsUsed": 2,
  "originalFilename": "invoice.pdf"
}

Format Conversion Result

Returns the converted document as binary data with metadata:

{
  "success": true,
  "operation": "convertFormat",
  "originalFilename": "document.pdf",
  "convertedFilename": "document.docx",
  "targetFormat": "docx",
  "creditsUsed": 1
}

Metadata Extraction Result

{
  "filename": "report.pdf",
  "fileSize": 2048000,
  "mimeType": "application/pdf",
  "pageCount": 15,
  "author": "John Doe",
  "title": "Annual Report 2024",
  "subject": "Company Performance",
  "keywords": ["business", "report", "annual"],
  "creationDate": "2024-01-15T10:30:00Z",
  "modificationDate": "2024-01-16T14:20:00Z",
  "hasPassword": false,
  "isEncrypted": false,
  "success": true
}

🔧 Supported File Formats

Input Formats

  • PDF: PDF documents (including password-protected)
  • Microsoft Word: DOCX, DOC
  • Text: TXT, RTF
  • Web: HTML, XML
  • Images: PNG, JPG, TIFF (for OCR)

Output Formats

  • PDF: Portable Document Format
  • DOCX: Microsoft Word (newer format)
  • TXT: Plain text
  • HTML: HyperText Markup Language
  • MD: Markdown
  • RTF: Rich Text Format

🔍 OCR Capabilities

Supported Languages

  • Portuguese (por): Optimized for Brazilian Portuguese
  • English (eng): US and UK English
  • Spanish (spa): Latin American and Iberian Spanish
  • French (fra): French language support
  • German (deu): German language support
  • Auto-detect (auto): Automatic language detection

OCR Example

{
  "operation": "ocrProcessing",
  "inputSource": "fileUrl",
  "fileUrl": "https://example.com/scanned-invoice.pdf",
  "ocrLanguage": "por",
  "advancedOptions": {
    "extractTables": true,
    "preserveFormatting": true
  }
}

🛠️ Advanced Use Cases

Invoice Processing Pipeline

[Email Trigger] → [Download Attachment] → [Extract Text] → [Parse Data] → [Update CRM]

Document Classification

[File Upload] → [Extract Metadata] → [Classify Type] → [Route to Process]

Bulk Document Conversion

[File Monitor] → [Document Processor] → [Convert to PDF] → [Archive]

Contract Analysis

[Document Input] → [Extract Text] → [Find Key Terms] → [Generate Summary]

📊 Processing Examples

Extract Contract Details

// Extract specific information from legal documents
{
  "operation": "extractText",
  "advancedOptions": {
    "extractTables": true,
    "preserveFormatting": true
  }
}
// Then use regex or NLP to find specific clauses

Convert Legacy Documents

// Convert old DOC files to modern formats
{
  "operation": "convertFormat",
  "targetFormat": "docx"
}

Process Scanned Forms

// OCR processing for form data extraction
{
  "operation": "ocrProcessing",
  "ocrLanguage": "eng",
  "advancedOptions": {
    "extractTables": true // For form fields
  }
}

💸 Pricing & Limits

  • Text Extraction: 1 credit per document
  • Format Conversion: 1 credit per conversion
  • OCR Processing: 2 credits per document
  • Page Splitting: 1 credit per document
  • Document Merging: 1 credit per operation
  • File Size Limit: 100MB per document
  • Page Limit: 500 pages per document

🚨 Error Handling

Common errors and solutions:

// Password-protected document
{
  "error": "Document is password protected",
  "success": false,
  "suggestion": "Provide password in advancedOptions"
}

// Unsupported format
{
  "error": "Unsupported file format: .xyz",
  "success": false,
  "suggestion": "Check supported input formats"
}

// OCR language not detected
{
  "error": "Could not detect document language",
  "success": false,
  "suggestion": "Specify OCR language manually"
}

Password-Protected Documents

{
  "advancedOptions": {
    "password": "your-document-password"
  }
}

🔄 Integration Examples

With PDF Generator

[Data] → [Generate PDF] → [Extract Text] → [Validate Content]

With Web Scraper

[Scrape URLs] → [Download PDFs] → [Process Documents] → [Store Data]

With Email

[Email Attachment] → [Process Document] → [Extract Key Info] → [Reply with Summary]

📋 Requirements

  • N8N version 0.174.0 or higher
  • N8N Tools account and API key
  • Node.js 18+ (for development)

🆘 Support

📄 License

MIT License - see LICENSE file for details.

Part of the N8N Tools ecosystemWebsiteAll Packages

Keywords

n8n

FAQs

Package last updated on 16 Sep 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts