You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

table2html

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

table2html

Detect and convert table image to html table

1.4.2

PyPI

Maintainers: 1

Table2HTML

A Python package that converts table images into HTML format using Object Detection model and OCR.

Installation

pip install table2html

Usage

Initialize

from table2html import Table2HTML

table_config = {
    "model_path": r"table2html\models\det_table_v1.pt",
    "confidence_threshold": 0.25,
    "iou_threshold": 0.7,
}

row_config = {
    "model_path": r"table2html\models\det_row_v0.pt",
    "confidence_threshold": 0.25,
    "iou_threshold": 0.7,
    "task": "detect",
}

column_config = {
    "model_path": r"table2html\models\det_col_v0.pt",
    "confidence_threshold": 0.25,
    "iou_threshold": 0.7,
    "task": "detect",
}

table2html = Table2HTML(table_config, row_config, column_config)

Table Detection

image = cv2.imread(r"table2html\images\sample.jpg")
detection_data = table2html.TableDetect(image)
# Output: [{"table_bbox": Tuple[int]}]

# Visualize table detection (first table)
from table2html.source import visualize_boxes
cv2.imwrite(
    "table_detection.jpg", 
    visualize_boxes(
        image, 
        [detection_data[0]["table_bbox"]], 
        color=(0, 0, 255),
        thickness=1
    )
)

Table detection result:

Table Detection Example

Structure Detection

data = table2html.StructureDetect(image)
# Output: {
#   "cells": List[Dict],
#   "num_rows": int,
#   "num_cols": int,
#   "html": str
# }

# Visualize structure detection
from table2html.source import visualize_boxes
cv2.imwrite(
    "structure_detection.jpg", 
    visualize_boxes(
        image, 
        [cell['box'] for cell in data['cells']], 
        color=(0, 255, 0),
        thickness=1
    )
)

# Write HTML output
with open('table.html', 'w') as f:
    f.write(data["html"])

Structure detection result:

Structure Detection Example

HTML output: extracted html.

Full Pipeline

Note: The cell coordinates are relative to the cropped table image.

table_crop_padding = 15
detection_data = table2html(image, table_crop_padding)
# Output: [{
#   "table_bbox": Tuple[int],
#   "cells": List[Dict],
#   "num_rows": int,
#   "num_cols": int,
#   "html": str
# }]

for i, data in enumerate(detection_data):
    table_image = crop_image(image, data["table_bbox"], table_crop_padding)
    cv2.imwrite(
        "table_detection.jpg",
        visualize_boxes(
            image,
            [data["table_bbox"]],
            color=(0, 0, 255),
            thickness=1
        )
    )
    cv2.imwrite(
        "structure_detection.jpg",
        visualize_boxes(
            table_image,
            [cell['box'] for cell in data['cells']],
            color=(0, 255, 0),
            thickness=1
        )
    )

    with open(f"table_{i}.html", "w") as f:
        f.write(data["html"])

Input

image: numpy.ndarray (OpenCV/cv2 image format)

Outputs

A list of extracted tables in structured:

table_bbox: Tuple[int] - Bounding box coordinates (x1, y1, x2, y2) of the table
cells: List[Dict] - List of cell dictionaries, where each dictionary contains:
- row: int - Row index
- column: int - Column index
- box: Tuple[int] - Bounding box coordinates (x1, y1, x2, y2)
- text: str - Cell text content
num_rows: int - Number of rows in the table
num_cols: int - Number of columns in the table
html: str - HTML representation of the table

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

FAQs

What is table2html?

Is table2html well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

table2html

Table2HTML

Installation

Usage

Initialize

Table Detection

Structure Detection

Full Pipeline

Input

Outputs

License

Related posts

Introducing Scala and Kotlin Support in Socket

AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress