
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
parseflow-core
Advanced tools
Document parsing library for ParseFlow - Extract text and data from PDF, Word (docx), and Excel (xlsx) files
Core PDF parsing library for ParseFlow - Extract text, metadata, images, and TOC from PDF files.
npm install parseflow-core
Or using pnpm:
pnpm add parseflow-core
Or using yarn:
yarn add parseflow-core
import { PDFParser } from 'parseflow-core';
const parser = new PDFParser();
// Extract all text
const result = await parser.extractText('path/to/document.pdf');
console.log(result.text);
// Extract specific page
const page2 = await parser.extractText('path/to/document.pdf', { page: 2 });
// Extract page range
const pages = await parser.extractText('path/to/document.pdf', { range: '1-5' });
const metadata = await parser.getMetadata('path/to/document.pdf');
console.log(metadata);
// {
// title: 'Document Title',
// author: 'Author Name',
// pageCount: 10,
// creationDate: '2025-01-01',
// ...
// }
const results = await parser.searchPDF('path/to/document.pdf', 'keyword', {
caseSensitive: false,
maxResults: 10
});
results.forEach(result => {
console.log(`Found on page ${result.page}: ${result.context}`);
});
import { ImageExtractorExternal } from 'parseflow-core';
const extractor = new ImageExtractorExternal();
const images = await extractor.extract('path/to/document.pdf', './output', {
format: 'png'
});
import { TOCExtractorExternal } from 'parseflow-core';
const tocExtractor = new TOCExtractorExternal();
const toc = await tocExtractor.extract('path/to/document.pdf');
console.log(toc);
Main parser class for PDF operations.
extractText(path, options?) - Extract text from PDFgetMetadata(path) - Get PDF metadatasearchPDF(path, query, options?) - Search for keywordsExtract images from PDF using external tools.
isAvailable() - Check if pdfimages is availableextract(pdfPath, outputDir, options?) - Extract imagesExtract table of contents from PDF.
isAvailable() - Check if pdftk/pdfinfo is availableextract(pdfPath, options?) - Extract TOCSome features require external tools:
Windows:
Linux:
sudo apt-get install poppler-utils
macOS:
brew install poppler
Windows:
Linux:
sudo apt-get install poppler-utils pdftk
macOS:
brew install poppler pdftk-java
For complete documentation, visit:
Contributions are welcome! Please see CONTRIBUTING.md for details.
MIT © Libres-coder
Made with ❤️ by ParseFlow Team
FAQs
Document parsing library for ParseFlow - Extract text and data from PDF, Word (docx), and Excel (xlsx) files
We found that parseflow-core demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.