New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

tabledetector

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

tabledetector

End-to-End table structure detector

  • 1.0.2
  • PyPI
  • Socket score

Maintainers
1

Tabledetector

PyPI

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

Features

  • PDF Input: Accepts PDF/Image files as input for table detection.
  • Alignment Check: Verifies and adjusts alignment of input.
  • Table Detection: Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
  • Table Extraction: Extract the tabular data in the form of dataframe.

Libraries Used

  • Python 3.x
  • OpenCV
  • NumPy
  • pdf2image
  • Pillow
  • scipy
  • jinja2
  • easyocr
  • pandas

Create and Activate Environment

conda create -n <env_name> python=3.7
conda activate <env_name>

Installation of package using pip

pip install tabledetector

Clone the repository for latest development release

git clone https://github.com/rajban94/TableDetector.git

Dependency

To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

Usage

Detection

For bordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='detect')

For semibordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='detect')

For unbordered table detection and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='detect')

Extraction

For bordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='extract')

For semibordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='extract')

For unbordered table detection and extraction and if rotation not required:

import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='extract')

If no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc