
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
xpdf-wrapper
Advanced tools
A powerful Node.js wrapper for Xpdf command-line tools
Extract text, images, fonts, and metadata from PDF files with ease
xpdf-wrapper brings the power of Xpdf's battle-tested PDF processing tools to Node.js. Whether you need to extract text for search indexing, convert PDFs to images, or analyze document metadata, this library provides a clean, modern API with full TypeScript support.
| Feature | Description |
|---|---|
| 📄 Complete Xpdf Suite | All 9 tools included: pdftotext, pdftops, pdftoppm, pdftopng, pdftohtml, pdfinfo, pdfimages, pdffonts, pdfdetach |
| 🔄 Buffer Support | Process PDFs directly from memory - no need to save temporary files |
| 📝 Direct Text Output | pdftotext returns extracted text directly in result.text |
| 🎯 TypeScript First | Complete type definitions for all tools and options |
| ⚡ Zero Config | Xpdf binaries are automatically downloaded on install |
| 🔀 Flexible API | Choose between standalone functions or the unified Xpdf class |
| 🚀 Batch Processing | Process multiple PDFs or run multiple operations concurrently |
# Using npm
npm install xpdf-wrapper
# Using yarn
yarn add xpdf-wrapper
# Using pnpm
pnpm add xpdf-wrapper
Note: Xpdf binaries are automatically downloaded for your platform (Windows, macOS, Linux) during installation.
import { pdftotext } from "xpdf-wrapper";
// Extract text from a PDF file
const result = await pdftotext("./document.pdf");
console.log(result.text);
import { pdftotext } from "xpdf-wrapper";
import { readFileSync } from "fs";
// Process PDF directly from a Buffer
const pdfBuffer = readFileSync("./document.pdf");
const result = await pdftotext(pdfBuffer);
console.log(result.text);
import { pdfinfo } from "xpdf-wrapper";
const result = await pdfinfo("./document.pdf");
console.log(result.stdout);
// Output:
// Creator: Microsoft Word
// Producer: Adobe PDF Library
// CreationDate: Mon Dec 25 12:00:00 2024
// Pages: 5
// File size: 102400 bytes
// ...
xpdf-wrapper provides wrappers for all 9 Xpdf command-line tools:
| Tool | Function | Description |
|---|---|---|
pdftotext | pdftotext() | Extract text content from PDF |
pdftops | pdftops() | Convert PDF to PostScript |
pdftoppm | pdftoppm() | Convert PDF pages to PPM images |
pdftopng | pdftopng() | Convert PDF pages to PNG images |
pdftohtml | pdftohtml() | Convert PDF to HTML |
pdfinfo | pdfinfo() | Get PDF metadata and information |
pdfimages | pdfimages() | Extract embedded images from PDF |
pdffonts | pdffonts() | List fonts used in PDF |
pdfdetach | pdfdetach() | Extract file attachments from PDF |
All tool wrappers accept either a file path (string) or a Buffer as input:
import {
pdftotext,
pdftops,
pdftoppm,
pdftopng,
pdftohtml,
pdfinfo,
pdfimages,
pdffonts,
pdfdetach
} from "xpdf-wrapper";
// Using file path
const text = await pdftotext("./document.pdf", undefined, { layout: true });
// Using Buffer
const buffer = readFileSync("./document.pdf");
const info = await pdfinfo(buffer, { rawDates: true });
// With options
const fonts = await pdffonts("./document.pdf");
For more structured results and batch operations, use the Xpdf class:
import { Xpdf } from "xpdf-wrapper";
import { readFileSync } from "fs";
const xpdf = new Xpdf();
// Extract text with parsed result
const textResult = await xpdf.pdfToText("./document.pdf");
console.log(textResult.text);
// Get PDF info with parsed metadata
const infoResult = await xpdf.pdfInfo("./document.pdf");
console.log(infoResult.info.Pages); // 5
console.log(infoResult.info.Creator); // "Microsoft Word"
// List fonts with parsed output
const fontsResult = await xpdf.pdfFonts("./document.pdf");
console.log(fontsResult.fonts); // Array of font objects
// Works with Buffers too
const buffer = readFileSync("./document.pdf");
const result = await xpdf.pdfInfo(buffer);
Pass an array to process multiple PDF files:
const xpdf = new Xpdf();
// Process multiple PDFs
const results = await xpdf.pdfInfo([
"./document1.pdf",
"./document2.pdf",
"./document3.pdf"
]);
// Results is an array
results.forEach((result, index) => {
console.log(`Document ${index + 1}: ${result.info.Pages} pages`);
});
// Mix file paths and Buffers
const buffer = readFileSync("./document2.pdf");
const mixedResults = await xpdf.pdfToText([
"./document1.pdf",
buffer,
"./document3.pdf"
]);
Run multiple operations on the same PDF(s) concurrently:
const xpdf = new Xpdf();
// Run multiple operations on a single PDF
const results = await xpdf.batch("./document.pdf", [
"pdfInfo",
"pdfFonts",
"pdfToText"
]);
// Access results by operation name
console.log("Page count:", results.pdfInfo?.info.Pages);
console.log("Fonts used:", results.pdfFonts?.fonts);
console.log("Text content:", results.pdfToText?.text);
| Variable | Default | Description |
|---|---|---|
NODE_XPDF_BIN_DIR | <package>/bin | Custom path to Xpdf binaries |
Configure the Xpdf class with custom options:
import { Xpdf } from "xpdf-wrapper";
const xpdf = new Xpdf({
// Custom binary directory
binDir: "/opt/xpdf/bin",
// Runtime options
run: {
timeoutMs: 30000, // 30 second timeout
}
});
Each tool supports its own set of options matching the Xpdf CLI:
// pdftotext options
await pdftotext("./doc.pdf", undefined, {
firstPage: 1,
lastPage: 10,
layout: true, // Maintain original layout
table: true, // Table mode
lineEnd: "unix", // Line endings: "unix" | "dos" | "mac"
enc: "UTF-8", // Output encoding
ownerPassword: "secret",
userPassword: "secret"
});
// pdfinfo options
await pdfinfo("./doc.pdf", {
firstPage: 1,
lastPage: 5,
box: true, // Print page box info
meta: true, // Print metadata
rawDates: true, // Print dates in raw format
});
// pdftopng options
await pdftopng("./doc.pdf", "./output", {
firstPage: 1,
lastPage: 1,
resolution: 300, // DPI
mono: true, // Monochrome output
gray: true, // Grayscale output
});
The examples/ directory contains working examples:
| Example | Description |
|---|---|
buffer-example.ts | Working with PDF Buffers |
pdftotext-example.ts | Text extraction examples |
pdfinfo-example.ts | Getting PDF metadata |
batch-example.ts | Batch processing examples |
# First, build the project
npm run build
# Then run an example
npx tsx examples/buffer-example.ts
npx tsx examples/pdftotext-example.ts
npx tsx examples/pdfinfo-example.ts
npx tsx examples/batch-example.ts
# Clone the repository
git clone https://github.com/iqbal-rashed/xpdf-wrapper.git
cd xpdf-wrapper
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Lint the code
npm run lint
# Format the code
npm run format
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
git checkout -b feature/amazing-feature)git commit -m 'Add some amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by Rashed Iqbal
⭐ Star this repo if you find it helpful! ⭐
FAQs
Node.js wrapper for Xpdf command-line tools
We found that xpdf-wrapper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.