New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

textract-trp

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

textract-trp

Parser for Amazon Textract results.

  • 0.1.3
  • PyPI
  • Socket score

Maintainers
1

Amazon Textract Results Parser - textract-trp

Amazon Textract Results Parser or trp module packaged and improved for ease of use.

TL;DR

pip install textract-trp

Requires Python 3.6 or newer.

Usage

import boto3
import trp

textract_client = boto3.client('textract')
results = textract_client.analyze_document(... your file and other params ...)
doc = trp.Document(results)

Now you can examine doc.pages. For example print all the detected on the page:

print(doc.pages[0].text)

Or print out the detected tables in CSV format:

for row in doc.pages[0].tables[0].rows:
    for cell in row.cells:
        print(cell.text.strip(), end=",")
    print()

Or retrieve text from a given position on the page. For that we have to create Bounding Box with the required coordinates relative to the page.

# Coordinates are from top-left corner [0,0] to bottom-right [1,1]
bbox = trp.BoundingBox(width=0.220, height=0.085, left=0.734, top=0.140)
lines = doc.pages[0].getLinesInBoundingBox(bbox)

# Print only the lines contained in the Bounding Box
for line in lines:
    print(line.text)

Refer to the Textract blog post and to amazon-textract-code-samples GitHub repository for more details.

Background

The Amazon blog post about Textract refers to a python module trp.py which used to be quite hard to find. There are many posts on the internet from people looking for the module, often confused by the "other trp module" that's got nothing to do with Textract.

Hence I decided to package and publish the trp.py module from the aws-samples/amazon-textract-code-samples repository. Fortunately its MIT license permits that.

Over time I have made some improvements to the module for ease of use.

Maintainer

Michael Ludvig

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc