
:warning: Please note that this is a wrapper around the doctr library to provide a Onnx pipeline for docTR. For feature requests, which are not directly related to the Onnx pipeline, please refer to the base project.
Optical Character Recognition made seamless & accessible to anyone, powered by Onnx
What you can expect from this repository:
- efficient ways to parse textual information (localize and identify each word) from your documents
- a Onnx pipeline for docTR, a wrapper around the doctr library - no PyTorch or TensorFlow dependencies
- more lightweight package with faster inference latency and less required resources
- 8-Bit quantized models for faster inference on CPU

Installation
Prerequisites
Python 3.10 (or higher) and pip are required to install OnnxTR.
Latest release
You can then install the latest release of the package using pypi as follows:
NOTE:
Currently supported execution providers by default are: CPU, CUDA (NVIDIA GPU), OpenVINO (Intel CPU | GPU).
For GPU support please take a look at: ONNX Runtime.
- Prerequisites: CUDA & cuDNN needs to be installed before Version table.
# standard cpu support
pip install "onnxtr[cpu]"
pip install "onnxtr[cpu-headless]" # same as cpu but with opencv-headless
# with gpu support
pip install "onnxtr[gpu]"
pip install "onnxtr[gpu-headless]" # same as gpu but with opencv-headless
# OpenVINO cpu | gpu support for Intel CPUs | GPUs
pip install "onnxtr[openvino]"
pip install "onnxtr[openvino-headless]" # same as openvino but with opencv-headless
# with HTML support
pip install "onnxtr[html]"
# with support for visualization
pip install "onnxtr[viz]"
# with support for all dependencies
pip install "onnxtr[html, gpu, viz]"
Recommendation:
If you have:
- a NVIDIA GPU, use one of the
gpu
variants - an Intel CPU or GPU, use one of the
openvino
variants - otherwise, use one of the
cpu
variants
OpenVINO:
By default OnnxTR running with the OpenVINO execution provider backend uses the CPU
device with FP32
precision, to change the device or for further configuaration please refer to the ONNX Runtime OpenVINO documentation.
Reading files
Documents can be interpreted from PDF / Images / Webpages / Multiple page images using the following code snippet:
from onnxtr.io import DocumentFile
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
webpage_doc = DocumentFile.from_url("https://www.yoursite.com")
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
Putting it together
Let's use the default ocr_predictor
model for an example:
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, EngineConfig
model = ocr_predictor(
det_arch='fast_base',
reco_arch='vitstr_base',
det_bs=2,
reco_bs=512,
assume_straight_pages=True,
straighten_pages=False,
export_as_straight_boxes=False,
preserve_aspect_ratio=True,
symmetric_pad=True,
detect_orientation=False,
detect_language=False,
disable_crop_orientation=False,
disable_page_orientation=False,
resolve_lines=True,
resolve_blocks=False,
paragraph_break=0.035,
load_in_8_bit=False,
det_engine_cfg=EngineConfig(),
reco_engine_cfg=EngineConfig(),
clf_engine_cfg=EngineConfig(),
)
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
result = model(doc)
result.show()

Or even rebuild the original document from its predictions:
import matplotlib.pyplot as plt
synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

The ocr_predictor
returns a Document
object with a nested structure (with Page
, Block
, Line
, Word
, Artefact
).
To get a better understanding of the document model, check out documentation:
You can also export them as a nested dict, more appropriate for JSON format / render it or export as XML (hocr format):
json_output = result.export()
text_output = result.render()
xml_output = result.export_as_xml()
for output in xml_output:
xml_bytes_string = output[0]
xml_element = output[1]
Advanced engine configuration options
You can also define advanced engine configurations for the models / predictors:
from onnxruntime import SessionOptions
from onnxtr.models import ocr_predictor, EngineConfig
general_options = SessionOptions()
general_options.enable_cpu_mem_arena = False
providers = [("CUDAExecutionProvider", {"device_id": 0, "cudnn_conv_algo_search": "DEFAULT"})]
engine_config = EngineConfig(
session_options=general_options,
providers=providers
)
predictor = ocr_predictor(
det_engine_cfg=engine_config,
reco_engine_cfg=engine_config,
clf_engine_cfg=engine_config
)
Loading custom exported models
You can also load docTR custom exported models:
For exporting please take a look at the doctr documentation.
from onnxtr.models import ocr_predictor, linknet_resnet18, parseq
reco_model = parseq("path_to_custom_model.onnx", vocab="ABC")
det_model = linknet_resnet18("path_to_custom_model.onnx")
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
Loading models from HuggingFace Hub
You can also load models from the HuggingFace Hub:
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor, from_hub
img = DocumentFile.from_images(['<image_path>'])
model = from_hub('onnxtr/my-model')
predictor = ocr_predictor(
det_arch='db_mobilenet_v3_large',
reco_arch=model
)
predictor = ocr_predictor(
det_arch=model,
reco_arch='crnn_mobilenet_v3_small'
)
res = predictor(img)
HF Hub search: here.
Collection: here
Or push your own models to the hub:
from onnxtr.models import parseq, push_to_hf_hub, login_to_hub
from onnxtr.utils.vocabs import VOCABS
login_to_hub()
model = parseq("~/onnxtr-parseq-multilingual-v1.onnx", vocab=VOCABS["multilingual"])
push_to_hf_hub(
model,
model_name="onnxtr-parseq-multilingual-v1",
task="recognition",
arch="parseq",
override=False
)
model = linknet_resnet18("~/onnxtr-linknet-resnet18.onnx")
push_to_hf_hub(
model,
model_name="onnxtr-linknet-resnet18",
task="detection",
arch="linknet_resnet18",
override=True
)
Models architectures
Credits where it's due: this repository provides ONNX models for the following architectures, converted from the docTR models:
Text Detection
Text Recognition
predictor = ocr_predictor()
predictor.list_archs()
{
'detection archs':
[
'db_resnet34',
'db_resnet50',
'db_mobilenet_v3_large',
'linknet_resnet18',
'linknet_resnet34',
'linknet_resnet50',
'fast_tiny',
'fast_small',
'fast_base'
],
'recognition archs':
[
'crnn_vgg16_bn',
'crnn_mobilenet_v3_small',
'crnn_mobilenet_v3_large',
'sar_resnet31',
'master',
'vitstr_small',
'vitstr_base',
'parseq'
]
}
Documentation
This repository is in sync with the doctr library, which provides a high-level API to perform OCR on documents.
This repository stays up-to-date with the latest features and improvements from the base project.
So we can refer to the doctr documentation for more detailed information.
NOTE:
pretrained
is the default in OnnxTR, and not available as a parameter.- docTR specific environment variables (e.g.: DOCTR_CACHE_DIR -> ONNXTR_CACHE_DIR) needs to be replaced with
ONNXTR_
prefix.
Benchmarks
The CPU benchmarks was measured on a i7-14700K Intel CPU
.
The GPU benchmarks was measured on a RTX 4080 Nvidia GPU
.
Benchmarking performed on the FUNSD dataset and CORD dataset.
docTR / OnnxTR models used for the benchmarks are fast_base
(full precision) | db_resnet50
(8-bit variant) for detection and crnn_vgg16_bn
for recognition.
The smallest combination in OnnxTR (docTR) of db_mobilenet_v3_large
and crnn_mobilenet_v3_small
takes as comparison ~0.17s / Page
on the FUNSD dataset and ~0.12s / Page
on the CORD dataset in full precision on CPU.
Library | FUNSD (199 pages) | CORD (900 pages) |
---|
docTR (CPU) - v0.8.1 | ~1.29s / Page | ~0.60s / Page |
OnnxTR (CPU) - v0.6.0 | ~0.57s / Page | ~0.25s / Page |
OnnxTR (CPU) 8-bit - v0.6.0 | ~0.38s / Page | ~0.14s / Page |
OnnxTR (CPU-OpenVINO) - v0.6.0 | ~0.15s / Page | ~0.14s / Page |
EasyOCR (CPU) - v1.7.1 | ~1.96s / Page | ~1.75s / Page |
PyTesseract (CPU) - v0.3.10 | ~0.50s / Page | ~0.52s / Page |
Surya (line) (CPU) - v0.4.4 | ~48.76s / Page | ~35.49s / Page |
PaddleOCR (CPU) - no cls - v2.7.3 | ~1.27s / Page | ~0.38s / Page |
Library | FUNSD (199 pages) | CORD (900 pages) |
---|
docTR (GPU) - v0.8.1 | ~0.07s / Page | ~0.05s / Page |
docTR (GPU) float16 - v0.8.1 | ~0.06s / Page | ~0.03s / Page |
OnnxTR (GPU) - v0.6.0 | ~0.06s / Page | ~0.04s / Page |
OnnxTR (GPU) float16 - v0.6.0 | ~0.05s / Page | ~0.03s / Page |
EasyOCR (GPU) - v1.7.1 | ~0.31s / Page | ~0.19s / Page |
Surya (GPU) float16 - v0.4.4 | ~3.70s / Page | ~2.81s / Page |
PaddleOCR (GPU) - no cls - v2.7.3 | ~0.08s / Page | ~0.03s / Page |
Citation
If you wish to cite please refer to the base project citation, feel free to use this BibTeX reference:
@misc{doctr2021,
title={docTR: Document Text Recognition},
author={Mindee},
year={2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/mindee/doctr}}
}
@misc{onnxtr2024,
title={OnnxTR: Optical Character Recognition made seamless & accessible to anyone, powered by Onnx},
author={Felix Dittrich},
year={2024},
publisher = {GitHub},
howpublished = {\url{https://github.com/felixdittrich92/OnnxTR}}
}
License
Distributed under the Apache 2.0 License. See LICENSE
for more information.