Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

pdfbeads

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdfbeads

  • 1.1.3
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

PDFBeads -- convert scanned images to a single PDF file Version 1.0 (November 2010)

Copyright (C) 2010 Alexey Kryukov (amkryukov@gmail.com). All rights reserved.

PDFBeads is a small utility written in Ruby which takes scanned page images and converts them into a single PDF file. Unlike other PDF creation tools, PDFBeads attempts to implement the approach typically used for DjVu books. Its key feature is separating scanned text (typically black, but indexed images with a small number of colors are also accepted) from halftone pictures. Each type of graphical data is encoded into its own layer with a specific compression method and resolution.

The name `PDFBeads' has been selected for the package because building PDF files from separate image is comparable to threading beads on a string. It also seems to be a good choice for a Ruby application.

Here's a few operations you can perform with PDFBeads:

  • encode B&W images using either CCITT Group 4 Fax or JBIG2 compression method (you'll need Adam Langley's jbig2 utility, available at https://github.com/agl/jbig2enc/ , for JBIG2 compression);

  • combine halftone or indexed pictures with previously binarized text pages, placing them into the background layer. Various compression methods of background images (JPEG2000, JPEG or PNG-styled deflate compression) are supported;

  • split mixed images where binarized text is combined with color or grayscale pictures (such pages may be produced with ScanTailor -- an interactive post-processing tool for scanned page, available at http://scantailor.sourceforge.net) and encode each layer separately;

  • correctly process indexed images with a limited number of colors, encoding each color separately into the foreground layer;

  • split color images into background and foreground layers (similar to BG44 and FG44 chunks in a DjVu file) according to a given mask;

  • create PDF files with TOC and metadata;

  • read text from hOCR files and create a hidden text layer in the PDF file.

Note that PDFBeads is intended for creating PDF files from previously processed images, and so it can't done some operations (e. g. converting color or grayscale scans to B&W) which should be typically performed with a special scan processing application, such as ScanTailor.

PDFBeads requires RMagick (the Ruby bindings for the popular Magick++ image processing library). The hpricot extension is not required, but highly recommended, as without it PDFBeads would not be able to read data from hOCR files.

FAQs

Package last updated on 31 Jan 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc