Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

pdfreader

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdfreader

Utility for simplifying the development of scripted / rule-based parsing of PDF files, including tabular data (tables, with automatic column detection).

0.0.7
Source
npm

Version published: 9 years ago

Weekly downloads: 39K; increased by3.92%

Maintainers: 1

Weekly downloads

Created: 10 years ago

Source

pdfreader

NPM module for simplifying the development of scripted / rule-based parsing of PDF files, including tabular data (tables, with automatic column detection).

Installation, tests and CLI usage

npm install pdfreader
cd node_modules/pdfreader
npm test
node parse.js test/sample.pdf

Raw PDF reading

The PdfReader class reads a PDF file, and calls a function on each item found while parsing that file.

An item object can match one of the following objects:

null, when the parsing is over, or an error occured.
{file:{path:string}}, when a PDF file is being opened.
{page:integer}, when a new page is being parsed, provides the page number, starting at 1.
{text:string, x:float, y:float, w:float, h:float...}, represents each text with its position.

Example:

new PdfReader().parseFileItems("sample.pdf", function(err, item){
  if (err)
    callback(err);
  else if (!item)
    callback();
  else if (item.text)
    console.log(item.text);
});

Rule-based data extraction

The Rule class can be used to define and process data extraction rules, while parsing a PDF document.

Rule instances expose "accumulators": methods that defines the data extraction strategy to be used for each rule.

Example:

var processItem = Rule.makeItemProcessor([
  Rule.on(/^Hello \"(.*)\"$/).extractRegexpValues().then(displayValue),
  Rule.on(/^Value\:/).parseNextItemValue().then(displayValue),
  Rule.on(/^c1$/).parseTable(3).then(displayTable),
  Rule.on(/^Values\:/).accumulateAfterHeading().then(displayValue),
]);
new PdfReader().parseFileItems("sample.pdf", function(err, item){
  processItem(item);
});

Keywords

FAQs

What is pdfreader?

Is pdfreader popular?

Is pdfreader well maintained?

Package last updated on 04 Sep 2015

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

pdfreader

pdfreader

Installation, tests and CLI usage

Raw PDF reading

Rule-based data extraction

Keywords

Related posts

Author Typosquatting on npm: Attackers Impersonate Sindre Sorhus with Malicious ‘chalk-node’ Package

Supply Chain Attack on LottieFiles Player Caused by Compromised npmjs Credentials