n8n-nodes-n8ntools-document-processor

Package Overview

Dependencies

Maintainers

Versions

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

n8n-nodes-n8ntools-document-processor

N8N Tools - Document Processor: Process and analyze documents with OCR, text extraction, and format conversion

latest

Source

npm

Version: 4.4.2

Version published: 2 months ago

Weekly downloads: 51

Maintainers: 1

Weekly downloads

Created: 3 months ago

Source

N8N Tools - Document Processor

Process and analyze documents with OCR, text extraction, and format conversion capabilities. This N8N community node provides comprehensive document processing through the N8N Tools platform.

✨ Features

📄 Text Extraction: Extract text from various document formats
🔍 OCR Processing: Extract text from images and scanned documents
🔄 Format Conversion: Convert between PDF, DOCX, TXT, HTML, MD, RTF
📊 Metadata Extraction: Get document properties and information
✂️ Page Splitting: Split documents into individual pages
🔗 Document Merging: Combine multiple documents
🌍 Multi-language OCR: Support for Portuguese, English, Spanish, French, German
💰 Cost Tracking: Usage monitoring and budget controls

🚀 Quick Start

Installation

Install this node in your N8N instance:

Via Community Nodes (Recommended)

Go to Settings > Community Nodes in your N8N interface
Click Install a community node
Enter n8n-nodes-n8ntools-document-processor
Click Install

Via npm

npm install n8n-nodes-n8ntools-document-processor

Setup Credentials

Sign up at N8N Tools and get your API key
In N8N, create new N8N Tools API credentials
Enter your API URL: https://api.n8ntools.io
Enter your API key

📖 Usage

Supported Operations

Operation	Description	Input	Output
Extract Text	Extract text content	PDF, DOCX, DOC, RTF	Plain text
Extract Metadata	Get document properties	Any document	JSON metadata
Convert Format	Change document format	Various formats	PDF, DOCX, TXT, HTML, MD, RTF
Split Pages	Split into individual pages	PDF, DOCX	ZIP with pages
Merge Documents	Combine multiple documents	Multiple files	Single document
OCR Processing	Extract text from images	PDF, images	Text with OCR

Example Workflow

[File Trigger] → [N8N Tools Document Processor] → [Extract Data] → [Database/Email]

Configuration Example

Invoice Text Extraction:

{
  "operation": "extractText",
  "inputSource": "binaryData",
  "binaryPropertyName": "data",
  "advancedOptions": {
    "extractImages": true,
    "extractTables": true,
    "preserveFormatting": true
  }
}

⚙️ Node Parameters

Input Configuration

Input Source: Binary Data, File URL, or Base64
Binary Property: Name of binary property (default: "data")
File URL: Direct URL to document file
Base64 Data: Base64 encoded document content

Operation-Specific Options

Format Conversion

Target Format: PDF, DOCX, TXT, HTML, MD, RTF

Page Splitting

Page Range: Specific pages (e.g., "1-5") or "all"

OCR Processing

Language: Portuguese, English, Spanish, French, German, Auto-detect

Advanced Options

Extract Images: Include images from document
Extract Tables: Parse table data
Preserve Formatting: Maintain original formatting
Password: For password-protected documents

📤 Output Data

Text Extraction Result

{
  "text": "This is the extracted text content...",
  "wordCount": 1250,
  "pageCount": 3,
  "hasImages": true,
  "hasTables": true,
  "images": [
    {
      "page": 1,
      "base64": "iVBORw0KGgoAAAANSUhEUgAA...",
      "format": "png"
    }
  ],
  "tables": [
    {
      "page": 2,
      "rows": 5,
      "columns": 3,
      "data": [["Header1", "Header2", "Header3"], ...]
    }
  ],
  "success": true,
  "operation": "extractText",
  "creditsUsed": 2,
  "originalFilename": "invoice.pdf"
}

Format Conversion Result

Returns the converted document as binary data with metadata:

{
  "success": true,
  "operation": "convertFormat",
  "originalFilename": "document.pdf",
  "convertedFilename": "document.docx",
  "targetFormat": "docx",
  "creditsUsed": 1
}

Metadata Extraction Result

{
  "filename": "report.pdf",
  "fileSize": 2048000,
  "mimeType": "application/pdf",
  "pageCount": 15,
  "author": "John Doe",
  "title": "Annual Report 2024",
  "subject": "Company Performance",
  "keywords": ["business", "report", "annual"],
  "creationDate": "2024-01-15T10:30:00Z",
  "modificationDate": "2024-01-16T14:20:00Z",
  "hasPassword": false,
  "isEncrypted": false,
  "success": true
}

🔧 Supported File Formats

Input Formats

PDF: PDF documents (including password-protected)
Microsoft Word: DOCX, DOC
Text: TXT, RTF
Web: HTML, XML
Images: PNG, JPG, TIFF (for OCR)

Output Formats

PDF: Portable Document Format
DOCX: Microsoft Word (newer format)
TXT: Plain text
HTML: HyperText Markup Language
MD: Markdown
RTF: Rich Text Format

🔍 OCR Capabilities

Supported Languages

Portuguese (por): Optimized for Brazilian Portuguese
English (eng): US and UK English
Spanish (spa): Latin American and Iberian Spanish
French (fra): French language support
German (deu): German language support
Auto-detect (auto): Automatic language detection

OCR Example

{
  "operation": "ocrProcessing",
  "inputSource": "fileUrl",
  "fileUrl": "https://example.com/scanned-invoice.pdf",
  "ocrLanguage": "por",
  "advancedOptions": {
    "extractTables": true,
    "preserveFormatting": true
  }
}

🛠️ Advanced Use Cases

Invoice Processing Pipeline

[Email Trigger] → [Download Attachment] → [Extract Text] → [Parse Data] → [Update CRM]

Document Classification

[File Upload] → [Extract Metadata] → [Classify Type] → [Route to Process]

Bulk Document Conversion

[File Monitor] → [Document Processor] → [Convert to PDF] → [Archive]

Contract Analysis

[Document Input] → [Extract Text] → [Find Key Terms] → [Generate Summary]

📊 Processing Examples

Extract Contract Details

// Extract specific information from legal documents
{
  "operation": "extractText",
  "advancedOptions": {
    "extractTables": true,
    "preserveFormatting": true
  }
}
// Then use regex or NLP to find specific clauses

Convert Legacy Documents

// Convert old DOC files to modern formats
{
  "operation": "convertFormat",
  "targetFormat": "docx"
}

Process Scanned Forms

// OCR processing for form data extraction
{
  "operation": "ocrProcessing",
  "ocrLanguage": "eng",
  "advancedOptions": {
    "extractTables": true // For form fields
  }
}

💸 Pricing & Limits

Text Extraction: 1 credit per document
Format Conversion: 1 credit per conversion
OCR Processing: 2 credits per document
Page Splitting: 1 credit per document
Document Merging: 1 credit per operation
File Size Limit: 100MB per document
Page Limit: 500 pages per document

🚨 Error Handling

Common errors and solutions:

// Password-protected document
{
  "error": "Document is password protected",
  "success": false,
  "suggestion": "Provide password in advancedOptions"
}

// Unsupported format
{
  "error": "Unsupported file format: .xyz",
  "success": false,
  "suggestion": "Check supported input formats"
}

// OCR language not detected
{
  "error": "Could not detect document language",
  "success": false,
  "suggestion": "Specify OCR language manually"
}

Password-Protected Documents

{
  "advancedOptions": {
    "password": "your-document-password"
  }
}

🔄 Integration Examples

With PDF Generator

[Data] → [Generate PDF] → [Extract Text] → [Validate Content]

With Web Scraper

[Scrape URLs] → [Download PDFs] → [Process Documents] → [Store Data]

With Email

[Email Attachment] → [Process Document] → [Extract Key Info] → [Reply with Summary]

PDF Generator: Create PDFs from processed data
Web Scraper: Scrape documents from websites

📋 Requirements

N8N version 0.174.0 or higher
N8N Tools account and API key
Node.js 18+ (for development)

🆘 Support

📧 Email: support@n8ntools.io
📖 Documentation: docs.n8ntools.io
💬 Community: Discord
🐛 Issues: GitHub

📄 License

MIT License - see LICENSE file for details.

Part of the N8N Tools ecosystem • Website • All Packages

Keywords

n8n

n8n-community-node-package

FAQs

What is n8n-nodes-n8ntools-document-processor?

Is n8n-nodes-n8ntools-document-processor popular?

Is n8n-nodes-n8ntools-document-processor well maintained?

Package last updated on 16 Sep 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

n8n-nodes-n8ntools-document-processor

N8N Tools - Document Processor

✨ Features

🚀 Quick Start

Installation

Via Community Nodes (Recommended)

Via npm

Setup Credentials

📖 Usage

Supported Operations

Example Workflow

Configuration Example

⚙️ Node Parameters

Input Configuration

Operation-Specific Options

Format Conversion

Page Splitting

OCR Processing

Advanced Options

📤 Output Data

Text Extraction Result

Format Conversion Result

Metadata Extraction Result

🔧 Supported File Formats

Input Formats

Output Formats

🔍 OCR Capabilities

Supported Languages

OCR Example

🛠️ Advanced Use Cases

Invoice Processing Pipeline

Document Classification

Bulk Document Conversion

Contract Analysis

📊 Processing Examples

Extract Contract Details

Convert Legacy Documents

Process Scanned Forms

💸 Pricing & Limits

🚨 Error Handling

Password-Protected Documents

🔄 Integration Examples

With PDF Generator

With Web Scraper

With Email

🔗 Related Packages

📋 Requirements

🆘 Support

📄 License

Keywords

Related posts

PyPI Expands Trusted Publishing to GitLab Self-Managed as Adoption Passes 25 Percent

Malicious Chrome Extension Exfiltrates Seed Phrases, Enabling Wallet Takeover