
Security News
OWASP 2025 Top 10 Adds Software Supply Chain Failures, Ranked Top Community Concern
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.
ocrodjvu is a wrapper for OCR systems that allows you to perform OCR on DjVu_ files.
.. _DjVu: http://djvu.org/
.. code:: console
$ wget -q 'https://sources.debian.org/data/main/o/ocropus/0.3.1-3/data/pages/alice_1.png' $ gm convert -threshold 50% 'alice_1.png' 'alice.pbm' $ cjb2 'alice.pbm' 'alice.djvu' $ ocrodjvu --in-place 'alice.djvu' Processing 'alice.djvu':
The following software is required to run ocrodjvu:
Python_ 3
an OCR engine:
DjVuLibre_ ≥ 3.5.26
djvulibre-python_ ≥ 0.9
lxml_ ≥ 2.0
Additionally, some optional features require the following software:
PyICU_ ≥ 1.0.1 —
required for the --word-segmentation=uax29 option
html5lib_ —
required for the --html5 option
The following software is required to rebuild the manual pages from source:
xsltproc_
DocBook XSL stylesheets_
.. _Python: https://www.python.org/ .. _Cuneiform: https://launchpad.net/cuneiform-linux .. _Ocrad: https://www.gnu.org/software/ocrad/ .. _GOCR: https://www-e.uni-magdeburg.de/jschulen/ocr/ .. _Tesseract: https://github.com/tesseract-ocr/tesseract .. _DjVuLibre: http://djvu.sourceforge.net/ .. _djvulibre-python: https://github.com/FriedrichFroebel/python-djvulibre .. _lxml: https://lxml.de/ .. _PyICU: https://pypi.org/project/PyICU/ .. _html5lib: https://github.com/html5lib/html5lib-python .. _xsltproc: http://xmlsoft.org/XSLT/xsltproc2.html .. _DocBook XSL stylesheets: https://github.com/docbook/xslt10-stylesheets
The easiest way to install ocrodjvu is from PyPI::
pip install ocrodjvu
Alternatively, you can use ocrodjvu without installing it, straight out of an unpacked source tarball or a VCS checkout.
It's also possible to install it from source for the current interpreter with::
pip install .
The man pages can be deployed using::
make install_manpage
By default, make install_manpage installs them to /usr/local/. You can specify a different installation prefix by setting the PREFIX variable, e.g.::
make install PREFIX="$HOME/.local"
This repository is a port of the original repository to Python 3.
The process involved the 2to3 tool and manual fixes afterwards to get the existing tests to pass. While this port started from scratch to already include the latest upstream changes, the fork by @rmast_ which accumulated previous porting attempts provided some great help (see Issue #39_ as well).
Due to the upstream repository having been archived (Issue #46_), this fork will now be maintained on its own. Please note that I do not have any plans on implementing completely new features for now. Nevertheless, I am going to try to keep this fork working for at least the parts which I actually use on a regular basis.
.. _@rmast: https://github.com/rmast/ocrodjvu/tree/python3 .. _Issue #46: https://github.com/jwilk-archive/ocrodjvu/issues/46 .. _Issue #39: https://github.com/jwilk-archive/ocrodjvu/issues/39
ocrodjvu development was supported by the Polish Ministry of Science and Higher Education's grant no. N N519 384036 (2009–2012, https://bitbucket.org/jsbien/ndt).
FAQs
OCR for DjVu (Python 3 fork)
We found that ocrodjvu demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.

Research
/Security News
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.