Exciting news!Announcing our $4.6M Series Seed. Learn more
Socket
LoveBlogFAQ
Install
Log in

pdf-stream

Package Overview
Dependencies
1
Maintainers
1
Versions
11
Issues
File Explorer

Advanced tools

pdf-stream

Creates a stream from PDF

    1.3.2latest

Version published
Maintainers
1
Yearly downloads
16,331
increased by138.72%

Weekly downloads

Readme

Source

pdf-stream

Code Climate

Creates a stream from PDF

Node.js module for streaming PDF text content.

Based on PDF.js library.

Table of Contents

Install

npm i pdf-stream --save

Usage

Basic

Text stream from PDF file

'use strict'; const text = require('pdf-stream').text; // Load file contents to ArrayBuffer synchronously let file = './example.pdf'; let pdf = new Uint8Array(fs.readFileSync(file)); // Stream PDF text to stdout text(pdf) .pipe(process.stdout);

You need the XMLHttpRequest as global variable. Install the xhr2 library locally:

npm i xhr2 --save 'use strict'; const text = require('pdf-stream').text; global.XMLHttpRequest = require('xhr2'); // for PDFJS let pdf = 'https://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf'; text(pdf) .pipe(process.stdout);

If you got error:

ReferenceError: DOMParser is not defined

You need the DOMParser as global variable, because PDF.js use it for XML metadata parsing. Install the xmldom library locally:

npm i xmldom --save 'use strict'; const text = require('pdf-stream').text; global.XMLHttpRequest = require('xhr2'); // File download global.DOMParser = require('xmldom').DOMParser; // XML Metadata parsing let pdf = 'https://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf'; text(pdf) .pipe(process.stdout);

Advanced

Create transform class for replacing string

'use strict'; const Transform = require('stream').Transform; const pdf_stream = require('pdf-stream'); const PDFReadable = pdf_stream.PDFReadable; const PDFStringifyTransform = pdf_stream.PDFStringifyTransform; let url = 'https://mozilla.github.io/pdf.js/web/compressed.tracemonkey-pldi-09.pdf'; // Transform class for replacing strings class ReplaceTransform extends Transform { constructor(options) { super({ writableObjectMode: true, readableObjectMode: true }); this.from = options.from; this.to = options.to; } // For every object _transform(obj, encoding, cb) { // Get text content items if (typeof obj.textContent !== 'undefined' && Array.isArray(obj.textContent.items)) { obj.textContent.items.forEach((item, i) => { // Working with the PDF.js `textContent` object // Replace substring to another obj.textContent.items[i].str = item.str.replace(this.from, this.to); }); } this.push(obj); cb(); } } // Pipeline new PDFReadable(url) .pipe(new ReplaceTransform({ from: /trace/gi, to: ':-)' })) .pipe(new PDFStringifyTransform()) // Convert stream from object to string .pipe(process.stdout);

API

All methods are streams, use them with .pipe().

text(options)

alternative usage: text(pdf, whitespace)

Gets text stream from PDF.

Convert PDF to text, optionally can replace whitespaces.

Options:

  • pdf — URL or ArrayBuffer;
  • whitespace — the string that replaces the whitespace . Replacement disabled by default.

In the PDF.js viewer whitespaces is an empty string. For making output comparable with the viewer use: text(pdf, '')

Return: {stream.Readable}

new PDFReadable(options)

alternative usage: new PDFReadable(pdf)

Making the Readable stream in object mode from PDF text content.

Options:

  • pdf — URL or ArrayBuffer;
  • inherit from stream.Readable options.

Return: {stream.Readable}

new PDFStringifyTransform(options)

alternative usage: new PDFStringifyTransform(whitespace)

Transform PDF text content object to string.

Options:

  • whitespace — the string that replaces the whitespace . Replacement disabled by default;
  • inherit from stream.Transform options.

Return: {stream.Readable}

Contribute

Contributors are welcome. Open an issue or submit pull request.

Small note: If editing the README, please conform to the standard-readme specification.

License

Apache 2.0

© Sergey N

Keywords

Socket

Product

Subscribe to our newsletter

Get open source security insights delivered straight into your inbox. Be the first to learn about new features and product updates.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc