
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
parseflow-core
Advanced tools
Document parsing library for ParseFlow - Extract text and data from PDF, Word (docx), and Excel (xlsx) files
Core PDF parsing library for ParseFlow - Extract text, metadata, images, and TOC from PDF files.
npm install parseflow-core
Or using pnpm:
pnpm add parseflow-core
Or using yarn:
yarn add parseflow-core
import { PDFParser } from 'parseflow-core';
const parser = new PDFParser();
// Extract all text
const result = await parser.extractText('path/to/document.pdf');
console.log(result.text);
// Extract specific page
const page2 = await parser.extractText('path/to/document.pdf', { page: 2 });
// Extract page range
const pages = await parser.extractText('path/to/document.pdf', { range: '1-5' });
const metadata = await parser.getMetadata('path/to/document.pdf');
console.log(metadata);
// {
// title: 'Document Title',
// author: 'Author Name',
// pageCount: 10,
// creationDate: '2025-01-01',
// ...
// }
const results = await parser.searchPDF('path/to/document.pdf', 'keyword', {
caseSensitive: false,
maxResults: 10
});
results.forEach(result => {
console.log(`Found on page ${result.page}: ${result.context}`);
});
import { ImageExtractorExternal } from 'parseflow-core';
const extractor = new ImageExtractorExternal();
const images = await extractor.extract('path/to/document.pdf', './output', {
format: 'png'
});
import { TOCExtractorExternal } from 'parseflow-core';
const tocExtractor = new TOCExtractorExternal();
const toc = await tocExtractor.extract('path/to/document.pdf');
console.log(toc);
Main parser class for PDF operations.
extractText(path, options?) - Extract text from PDFgetMetadata(path) - Get PDF metadatasearchPDF(path, query, options?) - Search for keywordsExtract images from PDF using external tools.
isAvailable() - Check if pdfimages is availableextract(pdfPath, outputDir, options?) - Extract imagesExtract table of contents from PDF.
isAvailable() - Check if pdftk/pdfinfo is availableextract(pdfPath, options?) - Extract TOCSome features require external tools:
Windows:
Linux:
sudo apt-get install poppler-utils
macOS:
brew install poppler
Windows:
Linux:
sudo apt-get install poppler-utils pdftk
macOS:
brew install poppler pdftk-java
For complete documentation, visit:
Contributions are welcome! Please see CONTRIBUTING.md for details.
MIT © Libres-coder
Made with ❤️ by ParseFlow Team
FAQs
Document parsing library for ParseFlow - Extract text and data from PDF, Word (docx), and Excel (xlsx) files
The npm package parseflow-core receives a total of 5 weekly downloads. As such, parseflow-core popularity was classified as not popular.
We found that parseflow-core demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.