Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More

ocrd-doxa

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ocrd-doxa

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding


Maintainers
1

PyPI version

ocrd_wrap

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding

Introduction

This offers OCR-D compliant workspace processors for binarization via Doxa (using its native Python bindings).

It is itself written in Python, and relies heavily on the OCR-D core API. This is responsible for handling METS/PAGE, and providing the OCR-D CLI.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Usage

OCR-D processor interface ocrd-doxa-binarize

To be used with PAGE-XML documents in an OCR-D annotation workflow.

ocrd-doxa-binarize -h

Usage: ocrd-doxa-binarize [OPTIONS]

  binarize via locally adaptive thresholding

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "dpi" [number - 0]
    pixel density in dots per inch (overrides any meta-data in the
    images); disabled when zero
   "level-of-operation" [string - "page"]
    PAGE XML hierarchy level to operate on
    Possible values: ["page", "region", "line"]
   "algorithm" [string - "ISauvola"]
    Thresholding algorithm to use.
    Possible values: ["Otsu", "Bernsen", "Niblack", "Sauvola", "Wolf",
    "Gatos", "NICK", "Su", "Singh", "Bataineh", "ISauvola", "WAN"]
   "parameters" [object - {}]
    Dictionary of algorithm-specific parameters. Unless overridden here,
    the following defaults are used:
	Bernsen:        {'window': 75, 'threshold': 100, 'contrast-limit': 25}
	NICK:           {'window': 75, 'k': -0.2}
	Niblack:        {'window': 75, 'k': 0.2}
	Singh:          {'window': 75, 'k', 0.2}
	Gatos:          {'glyph': 60}
	Sauvola:        {'window': 75, 'k': 0.2}
	Wolf:           {'window': 75, 'k': 0.2}
	WAN:            {'window': 75, 'k': 0.2}
	Su:             {'window': 0 (based on stroke size), 
                     'minN':  windowSize (roughly based on size of window)}

   (window/glyph sizes are in px, threshold/limits in uint8 [0,255])

Testing

none yet

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts