Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Python modules implementing OCR-D specs and related tools
This repository contains the python packages that form the base for tools within the OCR-D ecosphere.
All packages are also published to PyPI.
NOTE Unless you want to contribute to OCR-D/core, we recommend installation as part of ocrd_all which installs a complete stack of OCR-D-related software.
The easiest way to install is via pip
:
pip install ocrd
# or just the functionality you need, e.g.
pip install ocrd_modelfactory
All Python software released by OCR-D requires Python 3.8 or higher.
NOTE Some OCR-D-Tools (or even test cases) might reveal an unintended behavior if you have specific environment modifications, like:
NOTE: All OCR-D CLI tools support a --help
flag which shows usage and
supported flags, options and arguments.
ocrd
CLIocrd-dummy
CLIA minimal OCR-D processor that copies from -I/-input-file-grp
to -O/-output-file-grp
Almost all behaviour of the OCR-D/core software is configured via CLI options and flags, which can be listed with the --help
flag that all CLI support.
Some parts of the software are configured via environment variables:
OCRD_METS_CACHING
: If set to true
, access to the METS file is cached, speeding in-memory search and modification.
OCRD_PROFILE
: This variable configures the built-in CPU and memory profiling. If empty, no profiling is done. Otherwise expected to contain any of the following tokens:
CPU
: Enable CPU profiling of processor runsRSS
: Enable RSS memory profilingPSS
: Enable proportionate memory profilingOCRD_PROFILE_FILE
: If set, then the CPU profile is written to this file for later peruse with a analysis tools like snakeviz
PATH
: Search path for processor executables (affects ocrd process
and ocrd resmgr
).
HOME
: Directory to look for ocrd_logging.conf
, fallback for unset XDG variables (see below).
XDG_CONFIG_HOME
: Directory to look for ./ocrd/resources.yml
(i.e. ocrd resmgr
user database) – defaults to $HOME/.config
.
XDG_DATA_HOME
: Directory to look for ./ocrd-resources/*
(i.e. ocrd resmgr
data location) – defaults to $HOME/.local/share
.
OCRD_DOWNLOAD_RETRIES
: Number of times to retry failed attempts for downloads of workspace files.
OCRD_DOWNLOAD_TIMEOUT
: Timeout in seconds for connecting or reading (comma-separated) when downloading.
OCRD_METS_CACHING
: Whether to enable in-memory storage of OcrdMets data structures for speedup during processing or workspace operations.
OCRD_MAX_PROCESSOR_CACHE
: Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers.
OCRD_NETWORK_SERVER_ADDR_PROCESSING
: Default address of Processing Server to connect to (for ocrd network client processing
).
OCRD_NETWORK_SERVER_ADDR_WORKFLOW
: Default address of Workflow Server to connect to (for ocrd network client workflow
).
OCRD_NETWORK_SERVER_ADDR_WORKSPACE
: Default address of Workspace Server to connect to (for ocrd network client workspace
).
OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS
: Number of attempts for a worker to create its queue. Helpful if the rabbitmq-server needs time to be fully started.
Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.
See README for ocrd_utils
for further information.
Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.
See README for ocrd_models
for further information.
Code to instantiate models from existing data.
See README for ocrd_modelfactory
for further information.
Schemas and routines for validating BagIt, ocrd-tool.json
, workspaces, METS, page, CLI parameters etc.
See README for ocrd_validators
for further information.
Components related to OCR-D Web API
See README for ocrd_network
for further information.
Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.
Also contains the command line tool ocrd
.
See README for ocrd
for further information.
Builds a bash script that can be sourced by other bash scripts to create OCRD-compliant CLI.
See README for bashlib
for further information.
Download assets (make assets
)
Test with local files: make test
make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'
make docs
)FAQs
OCR-D framework
We found that ocrd demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.