Yet another library to extract text from MS Office and PDF files
docx parser
Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
A simple library that converts .docx files to plain text in the browser
Web components for displaying Docling output.
TypeScript definitions and functions for using Docling output.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Extracts comments and other data from docx files
Fork of office-text-extractor with unreleased changes that include browser support
This npm package offers a straightforward method to extract text content from various binary and text file formats. The package comes with a pre-built configuration that works out-of-the-box, requiring no additional setup. It is designed for use in Browse
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
Yet another library to extract text from MS Office and PDF files
A lightweight library to parse .docx files in Cloudflare Workers
Convert documents to markdown text content. Originally inspired by microsoft's markitdown python library.
Plain text parser that allows readers to extract and process words from plain text or `.txt` files.
Extend MDAST by parsing embedded HTML in Markdown. Converts HTML into structured MDAST nodes compatible with @m2d/core for DOCX generation.
A NodeJS library to parse pdf, txt, doc and docx files to JSON and CSV
A node script which can fill DOCX placeholders and convert to PDFs
A dead simple docx parser.
Docx parser for JavaScript/TypeScript
Primitives for building and extending readers, including definitions for context, parsers, modes, commands, plugins, and more.
A Text extracting package docx, pdf and pptx files
The Structured Parser JS/TS SDK allows developers to easily integrate Structured Parser's advanced structured data extraction capabilities from unstructured documents such as PDF, DOCX, XLSX.
> **Note** > This repository is automatically generated from the [main parser monorepo](https://github.com/TrialAndErrorOrg/parsers). Please submit any issues or pull requests there.
Javascript port of python-docx.