A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx, odt, odp, ods, pdf files.
Yet another library to extract text from MS Office and PDF files
docx parser
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products
TypeScript definitions and functions for using Docling output.
A simple library that converts .docx files to plain text in the browser
A modern JavaScript library for parsing and processing Microsoft Word DOCX documents with support for both buffer and stream operations. Features incremental parsing, checkbox detection, footnote support, and document validation.
Core engine that parses HTML into an intermediate DocumentElement tree and exposes a plugin registry so external adapters can convert that tree into DOCX, PDF, XLSX, Markdown and more.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Extracts comments and other data from docx files
A Node.js library for reading and extracting text from various document formats (PDF, DOCX, DOC, PPT, PPTX, TXT)
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Web components for displaying Docling output.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Convert documents to markdown text content. Originally inspired by microsoft's markitdown python library.
A dead simple docx parser.
A node script which can fill DOCX placeholders and convert to PDFs
Finds the number of occurrences of one or more phrases in a directory of .doc, .docx, and .pdf files.
A **React**-friendly, **Vite-powered** package for extracting and processing document data (e.g., PDF, DOCX, TXT).
Yet another library to extract text from MS Office and PDF files
Fork of office-text-extractor with unreleased changes that include browser support
Primitives for building and extending readers, including definitions for context, parsers, modes, commands, plugins, and more.
A Text extracting package docx, pdf and pptx files
Docx parser for JavaScript/TypeScript
A lightweight library to parse .docx files in Cloudflare Workers
This npm package offers a straightforward method to extract text content from various binary and text file formats. The package comes with a pre-built configuration that works out-of-the-box, requiring no additional setup. It is designed for use in Browse
> **Note** > This repository is automatically generated from the [main parser monorepo](https://github.com/TrialAndErrorOrg/parsers). Please submit any issues or pull requests there.
Plain text parser that allows readers to extract and process words from plain text or `.txt` files.
A TypeScript library to convert DOCX files to WYSIWYG HTML or plain text formats while preserving styles.
Javascript port of python-docx.
The Structured Parser JS/TS SDK allows developers to easily integrate Structured Parser's advanced structured data extraction capabilities from unstructured documents such as PDF, DOCX, XLSX.