๐Ÿšจ Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis โ†’
Socket
Book a DemoInstallSign in
Socket

@mazix/n8n-nodes-converter-documents

Package Overview
Dependencies
Maintainers
1
Versions
18
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@mazix/n8n-nodes-converter-documents

n8n node to convert various document formats (DOCX, XML, YML, XLS, XLSX, CSV, PDF, TXT, PPT, PPTX, HTML, JSON, ODT, ODP, ODS) to JSON or text format

latest
Source
npmnpm
Version
1.0.22
Version published
Weekly downloads
172
62.26%
Maintainers
1
Weekly downloads
ย 
Created
Source

๐Ÿ“„ n8n Document Converter Node

npm version License: MIT Tests TypeScript

๐Ÿš€ n8n community node for converting various document formats to JSON/text with AI-friendly output

๐Ÿ“‘ Table of Contents

โœจ Features

๐ŸŽฏ Core Features

  • โœ… 12+ file formats supported
  • โœ… Automatic file type detection
  • โœ… Hybrid processing (primary + fallback)
  • โœ… Stream processing for large files
  • โœ… Promise pooling for concurrency control
  • โœ… Comprehensive error handling

๐Ÿ”’ Security & Performance

  • โœ… Input validation & sanitization
  • โœ… XSS protection (sanitize-html)
  • โœ… Path traversal protection
  • โœ… Memory-efficient streaming
  • โœ… Configurable file size limits (up to 100MB)
  • โœ… JSON structure normalization

๐Ÿ“š Supported Formats

CategoryFormatsStatus
Text DocumentsDOCX, ODT, TXT, PDFโœ… Full Support
SpreadsheetsXLSX, ODS, CSVโœ… Multi-sheet support
PresentationsPPTX, ODPโœ… Full Support
Web & DataHTML, HTM, XML, JSONโœ… Full Support
E-commerceYML (Yandex Market)โœ… Specialized parsing
LegacyDOC, PPT, XLSโŒ Not supported*

*Legacy formats require conversion to modern formats (DOCX, PPTX, XLSX)

๐Ÿ“Š DOCX to HTML Conversion (v1.0.21+)

Latest: Node renamed to "Document Converter" in v1.0.22

๐ŸŽจ Choose Your Output Format

๐Ÿ“ Plain Text (Default)๐ŸŒ HTML Format

Best for:

  • Simple text extraction
  • Minimal output size
  • Maximum speed
  • Backward compatibility

Output size: ~3,600 chars

Best for:

  • Documents with tables
  • AI/LLM processing
  • Preserving formatting
  • Structured content

Output size: ~58,000 chars (+1,591%)

๐Ÿ“‹ Usage in n8n

1. Add "Document Converter" node
2. Select "Output Format (DOCX)" parameter:
   โ€ข Plain Text โ†’ Simple extraction
   โ€ข HTML โ†’ Tables + formatting preserved

๐Ÿ’ก Example Output

Plain Text Output
{
  "text": "Situation: Often search by one field\nAction: Create index on that field"
}
HTML Output (with tables)
{
  "text": "<table><tr><td><strong>Situation</strong></td><td><strong>Action</strong></td></tr><tr><td>Often search by one field</td><td>Create index on that field</td></tr></table>"
}

๐ŸŽฏ HTML Format Features

FeatureDescription
Tables<table>, <tr>, <td> - full structure preserved
Formatting<strong>, <em>, <h1>-<h6>
Lists<ul>, <ol>, <li>
Paragraphs<p> tags for structure
AI-Friendlyโœ… Understood by ChatGPT, Claude, Gemini

๐Ÿ“Š XLSX Multi-Sheet Processing

๐Ÿ—‚๏ธ How It Works

{
  "sheets": {
    "Products": [
      { "A": "ID", "B": "Name", "C": "Price" },
      { "A": 1, "B": "Apple", "C": 100 },
      { "A": 2, "B": "Banana", "C": 50 }
    ],
    "Orders": [
      { "A": "Order", "B": "Quantity" },
      { "A": 101, "B": 5 }
    ]
  }
}

๐Ÿ“Œ Key Features

FeatureDetails
Multiple SheetsEach sheet = separate array in sheets object
Column NamesA, B, C... Z (Excel-style)
Row FormatArray of objects (rows)
Empty CellsSkipped (only filled cells included)
Size Limit10,000 rows per sheet (configurable)
Memory SafeLarge files auto-limited to prevent OOM

๐Ÿš€ Installation

Via n8n web interface:

Settings โ†’ Community nodes โ†’ Install
Package name: @mazix/n8n-nodes-converter-documents

Or via command line:

npm install @mazix/n8n-nodes-converter-documents

Option 2: Standalone Version

# 1. Clone and build
git clone https://github.com/mazixs/n8n-node-converter-documents.git
cd n8n-node-converter-documents
npm install
npm run standalone

# 2. Copy to n8n
cp -r ./standalone ~/.n8n/custom-nodes/n8n-node-converter-documents
cd ~/.n8n/custom-nodes/n8n-node-converter-documents
npm install

# 3. Restart n8n

Option 3: Manual Installation

mkdir -p ~/.n8n/custom-nodes/n8n-node-converter-documents
cp dist/*.js dist/*.svg ~/.n8n/custom-nodes/n8n-node-converter-documents/
cp package.json ~/.n8n/custom-nodes/n8n-node-converter-documents/
cd ~/.n8n/custom-nodes/n8n-node-converter-documents
npm install --production

๐Ÿ“– Usage Examples

Text Document Output

{
  "text": "Extracted text content...",
  "metadata": {
    "fileName": "document.docx",
    "fileSize": 12345,
    "fileType": "docx",
    "processedAt": "2024-06-01T12:00:00.000Z"
  }
}

Excel Spreadsheet Output

{
  "sheets": {
    "Sheet1": [
      { "A": "Name", "B": "Age", "C": "City" },
      { "A": "Alice", "B": 30, "C": "Moscow" },
      { "A": "Bob", "B": 25, "C": "SPB" }
    ]
  },
  "metadata": {
    "fileName": "data.xlsx",
    "fileSize": 23456,
    "fileType": "xlsx"
  }
}

JSON Normalization

Input:

{
  "user": {
    "name": "John",
    "address": { "city": "Moscow" }
  }
}

Output (flattened):

{
  "text": "{\n  \"user.name\": \"John\",\n  \"user.address.city\": \"Moscow\"\n}",
  "warning": "Multi-level JSON structure was converted to flat object"
}

๐Ÿ—๏ธ Architecture

Strategy Pattern Implementation

DOCX Processing Flow:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. If outputFormat === 'html':     โ”‚
โ”‚    โ†’ mammoth.convertToHtml()       โ”‚
โ”‚    โ†’ [Success] Return HTML          โ”‚
โ”‚    โ†’ [Fail] Fallback to text       โ”‚
โ”‚                                     โ”‚
โ”‚ 2. Text mode (default):            โ”‚
โ”‚    โ†’ officeparser (primary)        โ”‚
โ”‚    โ†’ mammoth.extractRawText (fb)   โ”‚
โ”‚    โ†’ XML direct parsing (last)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Technology Stack

Core Libraries

  • officeparser (v5.1.1) - Primary parser
  • mammoth (v1.9.1) - DOCX processor
  • exceljs (v4.4.0) - Excel handler
  • pdf-parse (v1.1.1) - PDF fallback
  • papaparse (v5.5.3) - CSV parser

Build & Quality

  • TypeScript 5.8 (strict mode)
  • Jest (80 tests passing)
  • ESLint (TypeScript rules)
  • Webpack bundling
  • CommonJS modules

Security Features

FeatureImplementation
Input ValidationStrict type & structure checks
XSS Protectionsanitize-html library
Path TraversalFile name sanitization
Memory Limits10K rows/sheet, 50MB default
Dependency AuditRegular npm audit checks

๐Ÿ’ป Development

Quick Start

npm install        # Install dependencies
npm run dev        # Watch mode
npm run build      # Compile
npm test           # Run 80 tests
npm run lint       # Check code quality

Build Commands

CommandDescription
npm run buildTypeScript โ†’ JavaScript
npm run bundleWebpack bundling
npm run standaloneStandalone with deps
npm run test:coverageCoverage report
npm run lint:fixAuto-fix issues

Project Structure

โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ FileToJsonNode.node.ts  # Main node (Strategy Pattern)
โ”‚   โ”œโ”€โ”€ helpers.ts               # Utilities
โ”‚   โ””โ”€โ”€ errors.ts                # Custom errors
โ”œโ”€โ”€ test/
โ”‚   โ”œโ”€โ”€ unit/                    # Unit tests
โ”‚   โ”œโ”€โ”€ integration/             # Integration tests
โ”‚   โ””โ”€โ”€ samples/                 # Test files
โ”œโ”€โ”€ docs/                        # Documentation
โ”‚   โ”œโ”€โ”€ SOLUTION.md
โ”‚   โ”œโ”€โ”€ HTML_CONVERSION_PLAN.md
โ”‚   โ””โ”€โ”€ MAMMOTH_ANALYSIS.md
โ””โ”€โ”€ dist/                        # Compiled output

๐Ÿ“ˆ Latest Updates

๐ŸŽ‰ v1.0.22 (Current - 2025-10-10)

๐ŸŽจ UI & Quality

  • โœ… Node renamed: "Document Converter"
  • โœ… Icon fixed: 60ร—60 (proper size)
  • โœ… Code refactored: -78 lines
  • โœ… Zero duplication: 100% eliminated
  • โœ… Full error handling: PPTX fixed

๐Ÿ“š Docs & Tests

  • โœ… README redesign: Badges, TOC, tables
  • โœ… 80 tests passing (+7 XLSX)
  • โœ… Full JSDoc: All functions documented
  • โœ… Better IntelliSense: IDE support improved
  • โœ… Professional look: Visual tables & icons

What's New:

+ Node renamed to "Document Converter" (better UX)
+ Icon size fixed: 2048ร—1853 โ†’ 60ร—60
+ Code quality: eliminated all duplication
+ BaseConverterError class (DRY principle)
+ checkCFBFormat() helper (unified CFB check)
+ processViaOfficeParser() helper (unified error handling)
+ Full JSDoc documentation added
+ README complete visual redesign
+ 7 new XLSX multi-sheet tests

Previous Versions

v1.0.21 - DOCX to HTML Conversion
  • DOCX to HTML conversion with table support
  • outputFormat parameter (text | html)
  • Table preservation in HTML
  • AI/LLM friendly output
  • 73 tests passing
v1.0.20 - TextBox & Shapes Support
  • Extract text from TextBoxes and shapes
  • ONLYOFFICE document fix
  • 62 tests passing
v1.0.19 - ONLYOFFICE Parser Fix
  • Fixed XML namespace extraction
  • No more schema URLs in output
  • 61 tests passing

๐Ÿ“š Documentation

DocumentDescription
CHANGELOG.mdComplete version history
SOLUTION.mdArchitecture overview
HTML_CONVERSION_PLAN.mdDOCX to HTML implementation
MAMMOTH_ANALYSIS.mdLibrary research findings
optimization_plan.mdPerformance strategies
security.mdSecurity features

๐Ÿ”ง Troubleshooting

Common Issues

Error: Cannot find module 'exceljs'

# Solution 1: Use standalone version (recommended)
npm run standalone

# Solution 2: Check dependencies
cd ~/.n8n/custom-nodes/n8n-node-converter-documents
npm list
npm install

Large files causing OOM

  • Split files into smaller parts
  • Reduce maxFileSize parameter
  • Use streaming for CSV/TXT formats

โš ๏ธ Limitations

LimitationDetailsWorkaround
Legacy formatsDOC, PPT, XLS not supportedConvert to DOCX, PPTX, XLSX
MemoryLarge PDF/XLSX load into RAMSplit files or increase memory
File sizeDefault 50MB limitConfigurable up to 100MB

๐Ÿ“Š Statistics

  • 12+ file formats supported
  • 80 tests passing
  • 5 specialized parsers
  • 10K rows per sheet limit
  • 100MB max file size
  • 0 critical vulnerabilities

๐Ÿค Contributing

Issues and pull requests are welcome!

๐Ÿ“ License

MIT ยฉ mazix

Made with โค๏ธ for the n8n community

If you find this helpful, please โญ star the repository!

Keywords

n8n-community-node-package

FAQs

Package last updated on 10 Oct 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts