What is pdfjs-dist?
The pdfjs-dist package is a pre-built version of the PDF.js library, which is a general-purpose, web standards-based platform for parsing and rendering PDFs. It allows you to display PDF files in your web pages without the need for a native PDF reader. The package provides a wide range of features including, but not limited to, rendering PDFs, reading PDF metadata, and interacting with PDF content programmatically.
What are pdfjs-dist's main functionalities?
Rendering PDF pages to a canvas element
This code sample demonstrates how to render the first page of a PDF to a canvas element in the browser. It uses the getDocument method to load the PDF and the getPage method to access a specific page, then renders it to the provided canvas context.
const pdfjsLib = require('pdfjs-dist/legacy/build/pdf.js');
async function renderPage(url, canvasContext) {
const pdf = await pdfjsLib.getDocument(url).promise;
const page = await pdf.getPage(1);
const viewport = page.getViewport({ scale: 1.5 });
const renderContext = {
canvasContext: canvasContext,
viewport: viewport
};
await page.render(renderContext).promise;
}
Extracting text from a PDF page
This code sample shows how to extract text content from the first page of a PDF. It retrieves the text items from the page and then maps them to strings, joining them to form the full text content of the page.
const pdfjsLib = require('pdfjs-dist/legacy/build/pdf.js');
async function extractTextFromPage(url) {
const pdf = await pdfjsLib.getDocument(url).promise;
const page = await pdf.getPage(1);
const textContent = await page.getTextContent();
return textContent.items.map(item => item.str).join(' ');
}
Other packages similar to pdfjs-dist
pdf-parse
pdf-parse is a package that provides functionality to extract text from PDF files. It is similar to pdfjs-dist in its text extraction capabilities but does not include the ability to render PDFs to a canvas element.
pdf-lib
pdf-lib is a package that allows you to create, modify, and sign PDF documents in any JavaScript environment. While pdfjs-dist focuses on rendering and reading PDFs, pdf-lib provides more features for manipulating PDF files, such as adding pages, drawing shapes, and inserting images.
PDF.js
PDF.js is a Portable Document Format (PDF) library that is built with HTML5.
Our goal is to create a general-purpose, web standards-based platform for
parsing and rendering PDFs.
This is a pre-built version of the PDF.js source code. It is automatically
generated by the build scripts.
For usage with older browsers or environments, without support for modern
features such as async
/await
, optional chaining, nullish coalescing,
and private class
fields/methods; please see the legacy/
folder.
See https://github.com/mozilla/pdf.js for learning and contributing.