Socket
Book a DemoInstallSign in
Socket

transtractor

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

transtractor

Universal PDF bank statement parsing library

pipPyPI
Version
0.9.1
Maintainers
1

The Transtractor

PyPI version Development Status Tests codecov License

Universal PDF bank statement parsing

The Transaction Extractor, or 'Transtractor', aspires to be a universal library for extracting transaction data from PDF bank statements. Key features:

  • Written in Rust (fast)
  • Python API (user friendly)
  • AI-free (lightweight)
  • Rules-based extraction (100% predictable and accurate)

Installation

Compile from source

  • Install Rust: Download and install Rust from rustup.rs

  • Install Maturin: Install the Python build tool for Rust extensions

    pip install maturin
    
  • Build and install Transtractor: Clone the repository and build

    git clone https://github.com/gravytoast/transtractor.git
    cd transtractor
    maturin develop --release
    

Basic usage (Python)

  • Import and initialise the parser

    from transtractor import Parser
    
    parser = Parser()
    
  • Convert PDF to CSV: All CSV files are written in a standard format

    parser.parse('statement.pdf').to_csv('statement.csv')
    
  • Convert PDF to DataFrame: Load into a DataFrame for analysis

    import pandas as pd
    
    data = parser.parse('statement.pdf').to_pandas_dict()
    df = pd.DataFrame(data)
    

Supported statements

See the documentation for a current list of supported statements. You may also create your own parsing configuration files by following these instructions and loading it by:

from transtractor import Parser

parser = Parser()
parser.load('my_config.json')
parser.parse('statement.pdf').to_csv('statement.csv')

Contributions

New and well-tested configuration files are especially welcome. Please submit a pull request with them add to the python/transtractor/configs directory, or email to develop@transtractor.net.

Keywords

pdf

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts