Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

pdfreader

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pdfreader

Pythonic API for parsing PDF files

  • 0.1.15
  • PyPI
  • Socket score

Maintainers
1

========= pdfreader

:Info: See the tutorials & documentation <https://pdfreader.readthedocs.io>_ for more information. :Author & Maintainer: Maksym Polshcha maxp@sterch.net

See GitHub <https://github.com/maxpmaxp/pdfreader>_ for the latest source.

About

pdfreader is a Pythonic API for: * extracting texts, images and other data from PDF documents (plain or protected) * accessing different objects within PDF documents

pdfreader is NOT a tool (maybe one day it become!): * to create or update PDF files * to split PDF files into pages or other pieces * convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See Tutorials & Documentation <https://pdfreader.readthedocs.io>_.

Features

  • Extracts texts (plain text and formatted text objects)
  • Extract PDF forms data (pure strings and formatted text objects)
  • Supports all PDF encodings, CMap, predefined cmaps.
  • Extracts images and image masks as Pillow/PIL Images <https://pillow.readthedocs.io/en/stable/reference/Image.html>_
  • Supports encrypted and password-protected PDF documents
  • Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)
  • Follows PDF-1.7 specification <https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf>_
  • Lazy objects access allows to process huge PDF documents quite fast

Installation

pdfreader can be installed with pip <http://pypi.python.org/pypi/pip>_::

$ python -m pip install pdfreader

Or easy_install from setuptools <http://pypi.python.org/pypi/setuptools>_::

$ python -m easy_install pdfreader

You can also download the project source and do::

$ python setup.py install

Tutorial and Documentation

Tutorial, real-life examples and documentation <https://pdfreader.readthedocs.io>_

Support, Bugs & Feature Requests

pdfreader uses GitHub issues <https://github.com/maxpmaxp/pdfreader/issues>_ to keep track of bugs, feature requests, etc.

  • pdfminer <https://github.com/euske/pdfminer>_
  • pyPdf2 <https://github.com/py-pdf/PyPDF2>_
  • xpdf <http://www.foolabs.com/xpdf/>_
  • pdfbox <http://pdfbox.apache.org/>_
  • mupdf <http://mupdf.com/>_

References

  • Document management - Potable document format - PDF 1.7 <https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>_
  • Adobe CMap and CIDFont Files Specification <https://www.adobe.com/content/dam/acom/en/devnet/font/pdfs/5014.CIDFont_Spec.pdf>_
  • PostScript Language Reference Manual <https://www-cdf.fnal.gov/offline/PostScript/PLRM2.pdf>_
  • Adobe CMap resources <https://github.com/adobe-type-tools/cmap-resources>_
  • Adobe glyph list specification (AGL) <https://github.com/adobe-type-tools/agl-specification>_

Donation

If this project is helpful, you can treat me to coffee :-)

.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=VMVFZSDHDFVK6&item_name=PDFReader+support&currency_code=USD&source=url

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc