
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
pdfdataextract
Advanced tools
Extract data from a pdf with pure javascript.
The PdfData wrapper over PdfDataExtractor is inspired by https://www.npmjs.com/package/pdf-parse, which is currently unmaintained. PdfDataExtractor itself is a simple interface to extract individual data from a pdf file.
npm install pdfdataextract
Full documentation is available at the wiki
PdfData is a wrapper around PdfDataExtractor to directly get a complete json structure.
import { PdfData, VerbosityLevel } from 'pdfdataextract';
import { readFileSync } from 'fs';
const file_data = readFileSync('some_pdf_file.pdf');
// all options are optional
PdfData.extract(file_data, {
password: '123456', // password of the pdf file
pages: 1, // how many pages should be read at most
sort: true, // sort the text by text coordinates
verbosity: VerbosityLevel.ERRORS, // set the verbosity level for parsing
get: { // enable or disable data extraction (all are optional and enabled by default)
pages: true, // get number of pages
text: true, // get text of each page
fingerprint: true, // get fingerprint
outline: true, // get outline
metadata: true, // get metadata
info: true, // get info
permissions: true, // get permissions
},
}).then((data) => {
data.pages; // the number of pages
data.text; // an array of text pages
data.fingerprint; // fingerprint of the pdf document
data.outline; // outline data of the pdf document
data.info; // information of the pdf document, such as Author
data.metadata; // metadata of the pdf document
data.permissions; // permissions for the document
});
import { PdfDataExtractor, VerbosityLevel } from 'pdfdataextract';
import { readFileSync } from 'fs';
const file_data = readFileSync('some_pdf_file.pdf');
// all options are optional
PdfDataExtractor.get(file_data, {
password: '123456', // password of the pdf file
verbosity: VerbosityLevel.ERRORS, // set the verbosity level for parsing
}).then((extractor) => {
extractor.pages; // the number of pages
extractor.fingerprint; // fingerprint of the pdf document
extractor.getText(1, true).then((text) => {
// an array of text pages (only one page and sorted)
});
extractor.getText([2]).then((text) => {
// an array of text pages (only the second page)
});
extractor.getOutline().then((outline) => {
// outline data of the pdf document
});
extractor.getMetadata().then((metadata) => {
// metadata of the pdf document
});
extractor.getPermissions().then((permissions) => {
// permissions for the document
});
extractor.close();
});
npm test
FAQs
Extract data from a pdf with pure javascript
The npm package pdfdataextract receives a total of 4,014 weekly downloads. As such, pdfdataextract popularity was classified as popular.
We found that pdfdataextract demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.