Yet another library to extract text from MS Office and PDF files
docx parser
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products
TypeScript definitions and functions for using Docling output.
A simple library that converts .docx files to plain text in the browser
Core engine that parses HTML into an intermediate DocumentElement tree and exposes a plugin registry so external adapters can convert that tree into DOCX, PDF, XLSX, Markdown and more.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
Web components for displaying Docling output.
Extracts comments and other data from docx files
A node script which can fill DOCX placeholders and convert to PDFs
This npm package offers a straightforward method to extract text content from various binary and text file formats. The package comes with a pre-built configuration that works out-of-the-box, requiring no additional setup. It is designed for use in Browse
A lightweight library to parse .docx files in Cloudflare Workers
Docx parser for JavaScript/TypeScript
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Fork of office-text-extractor with unreleased changes that include browser support
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Plain text parser that allows readers to extract and process words from plain text or `.txt` files.
> **Note** > This repository is automatically generated from the [main parser monorepo](https://github.com/TrialAndErrorOrg/parsers). Please submit any issues or pull requests there.
Javascript port of python-docx.
Convert documents to markdown text content. Originally inspired by microsoft's markitdown python library.
Yet another library to extract text from MS Office and PDF files
A dead simple docx parser.
Primitives for building and extending readers, including definitions for context, parsers, modes, commands, plugins, and more.
A Text extracting package docx, pdf and pptx files
The Structured Parser JS/TS SDK allows developers to easily integrate Structured Parser's advanced structured data extraction capabilities from unstructured documents such as PDF, DOCX, XLSX.