Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

pdf-parse

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdf-parse

Pure javascript cross-platform module to extract text from PDFs.

  • 1.1.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
465K
decreased by-17.02%
Maintainers
1
Weekly downloads
 
Created

What is pdf-parse?

The pdf-parse npm package is a simple and effective tool for extracting text and metadata from PDF files. It can be used to parse PDF files and retrieve their content in a structured format, making it useful for various applications such as data extraction, text analysis, and document processing.

What are pdf-parse's main functionalities?

Extract Text from PDF

This feature allows you to extract the text content from a PDF file. The code sample reads a PDF file from the filesystem and uses the pdf-parse package to extract and print the text content.

const fs = require('fs');
const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('example.pdf');

pdf(dataBuffer).then(function(data) {
    console.log(data.text); // Print the text content of the PDF
});

Extract Metadata from PDF

This feature allows you to extract metadata from a PDF file. The code sample reads a PDF file from the filesystem and uses the pdf-parse package to extract and print the metadata information such as title, author, and creation date.

const fs = require('fs');
const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('example.pdf');

pdf(dataBuffer).then(function(data) {
    console.log(data.info); // Print the metadata of the PDF
});

Extract Text and Metadata from PDF

This feature allows you to extract both text content and metadata from a PDF file. The code sample reads a PDF file from the filesystem and uses the pdf-parse package to extract and print both the text content and metadata information.

const fs = require('fs');
const pdf = require('pdf-parse');

let dataBuffer = fs.readFileSync('example.pdf');

pdf(dataBuffer).then(function(data) {
    console.log(data.text); // Print the text content of the PDF
    console.log(data.info); // Print the metadata of the PDF
});

Other packages similar to pdf-parse

Keywords

FAQs

Package last updated on 24 Oct 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc