tabledetector

End-to-End table structure detector

1.0.2
PyPI

Maintainers: 1

Tabledetector

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

Features

PDF Input: Accepts PDF/Image files as input for table detection.
Alignment Check: Verifies and adjusts alignment of input.
Table Detection: Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
Table Extraction: Extract the tabular data in the form of dataframe.

Libraries Used

Python 3.x
OpenCV
NumPy
pdf2image
Pillow
scipy
jinja2
easyocr
pandas

Create and Activate Environment

conda create -n <env_name> python=3.7
conda activate <env_name>

Installation of package using pip

pip install tabledetector

Clone the repository for latest development release

git clone https://github.com/rajban94/TableDetector.git

Dependency

To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

Usage

Detection

For bordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='detect')

For semibordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='detect')

For unbordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='detect')

Extraction

For bordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='extract')

For semibordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='extract')

For unbordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='extract')

If no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.

Keywords

table detector

FAQs

What is tabledetector?

Is tabledetector well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install