Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
diff-pdf-visually
: Find visual differences between two PDFs
.. image:: https://img.shields.io/pypi/v/diff-pdf-visually.svg :target: https://pypi.python.org/pypi/diff-pdf-visually/
.. image:: https://img.shields.io/pypi/l/diff-pdf-visually.svg :target: https://pypi.python.org/pypi/diff-pdf-visually/
.. image:: https://img.shields.io/badge/commitizen-friendly-brightgreen.svg :alt: Commitizen friendly :target: https://commitizen.github.io/cz-cli/
.. contents:: Table of Contents :backlinks: none
This script checks whether two PDFs are visually the same. So:
This is in contrast to most other tools, which tend to extract the text stream out of a PDF, and then diff those texts. Such tools include:
pdf-diff <https://github.com/JoshData/pdf-diff>
_ by Joshua TaubererThere seem to be some tools similar to the one you're looking at now, although I have experience with none of these:
an open source one <https://github.com/vslavik/diff-pdf>
_this SuperUser thread <https://superuser.com/questions/46123/how-to-compare-the-differences-between-two-pdf-files-on-windows>
_The strength of this script is that it's simple to use on the command line, and it's easy to reuse in scripts:
.. code-block:: python
from diff_pdf_visually import pdf_similar
# Returns True or False
pdf_similar("a.pdf", "b.pdf")
Or use it from the command line:
.. code-block:: shell
$ pip3 install --user diff-pdf-visually
$ diff-pdf-visually a.pdf b.pdf
You can install this tool with pip3
, but we need the ImageMagick and Poppler programs.
sudo apt update
sudo apt install python3-pip imagemagick poppler-utils
pip3 install --user diff-pdf-visually
pip3 install --user
something, then log out totally from Linux and log in again. (This is to refresh the PATH
.)diff-pdf-visually
.brew install poppler imagemagick
.pip3 install --user diff-pdf-visually
pip3 install --user
something, then close your terminal and open a new one. (This is to refresh the PATH
.)diff-pdf-visually
.I've never tried but I think this will work. Give it a go and let me know (at bram at bram dot xyz) if it worked! Unfortunately it takes quite a while to get everything installed.
Install Windows Subsystem for Linux (WSL) and Ubuntu 18.04, for instance with this tutorial <https://docs.microsoft.com/en-us/windows/wsl/install-win10>
_
Initialize Ubuntu 18.04 (tutorial <https://docs.microsoft.com/en-us/windows/wsl/initialize-distro>
_)
Now proceed with the Ubuntu Linux instructions.
Let me know (at bram at bram dot xyz) if this worked!
Lars Olafsson suggested that the following might work:
diff-pdf-visually
via Pip.Path
variable to add the bin
folder that was extracted.diff-pdf-visually
.We use pdftocairo
to convert both PDFs to a series of PNG images in a temporary directory. The number of pages and the dimensions of the page must be exactly the same. Then we call compare
from ImageMagick to check how similar they are; if one of the pages compares different above a certain threshold, then the PDFs are reported as different, otherwise they are reported the same.
You must have ImageMagick and poppler already installed.
Call diff-pdf-visually
without parameters (or run python3 -m diff_pdf_visually
) to see its command line arguments. Import it as diff_pdf_visually
to use its functions from Python.
There are some options that you can use either from the command line or from Python::
$ diff-pdf-visually -h
usage: diff-pdf-visually [-h] [--silent] [--verbose] [--threshold THRESHOLD]
[--dpi DPI] [--time TIME]
a.pdf b.pdf
Compare two PDFs visually. The exit code is 0 if they are the same, and 2 if
there are significant differences.
positional arguments:
a.pdf
b.pdf
optional arguments:
-h, --help show this help message and exit
--silent, -q silence output (can be used only once)
--verbose, -v show more information (can be used 2 times)
--threshold THRESHOLD
PSNR threshold to consider a change significant,
higher is more sensitive (default: 100)
--dpi DPI resolution for the rasterised files (default: 50)
--time TIME number of seconds to wait before discarding temporary
files, or 0 to immediately discard
These "temporary files" include a PNG image of where any differences are, per page, as well as the log output of ImageMagick. If you want to get a feeling for thresholds, there are some example PDFs in the tests/
directory.
There is also an environment variable:
COMPARE
: override the path of ImageMagick compare. By default, we try first compare
and then magick compare
(for Windows).Personally, I've used this a couple of times to refactor my LaTeX documents: I just simplify or remove some macro definitions, and if nothing changes, apparently it's safe to make that change.
At the moment, this program/module works best for finding whether two PDFs are visually different.
This project will not work on Python 2.
The code is dual-licenced under both
MIT License <https://choosealicense.com/licenses/mit>
_Apache License, Version 2.0 <https://choosealicense.com/licenses/apache-2.0>
_at your option.
The versions that are regularly tested can be found here <https://github.com/bgeron/diff-pdf-visually/blob/main/tox.ini>
_, that's probably Python 3.8 and Python 3.9.
For your convenience we declare more Python versions acceptable in pyproject.toml
, but the non-tested versions could potentially break from time to time. My goal is to support basically Python 3.x; please let me know if something doesn't work on an older version.
FAQs
Unknown package
We found that diff-pdf-visually demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.