![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
The REWE eBon Parser is a Python package designed to parse REWE eBons (receipts) from PDF files and convert them into structured JSON format or CSV. The package also provides functionality to output raw text extracted from the PDFs for debugging purposes. This project is a re-write of the the rewe-ebon-parser
TypeScript library, example PDFs are borrowed from the same library.
pdfplumber
).You can install the package using pip:
pip install rewe-ebon-parser
You can find PDF receipt files to test on in the examples/eBons
folder in this repo borrowed from rewe-ebon-parser
.
rewe-ebon-parser [--file] <input_pdf_path> [output_json_path]
Example:
rewe-ebon-parser examples/eBons/1.pdf
rewe-ebon-parser [--folder] <input_folder> [output_folder] [--nthreads <number_of_threads>]
Example:
rewe-ebon-parser examples/eBons/
rewe-ebon-parser [--file] <input_pdf_path> [output_csv_path] [--csv-table]
Example:
rewe-ebon-parser examples/eBons/1.pdf --csv-table
rewe-ebon-parser [--folder] <input_folder> [output_folder] [--nthreads <number_of_threads>] [--csv-table]
Example (the module automatically detects if its a folder of PDFs or JSONs):
rewe-ebon-parser examples/eBons/ --csv-table
Note: the module will fail if the folder contains both JSON and PDF files to avoid duplicating the same data.
rewe-ebon-parser [--folder] <input_folder> [output_csv_path] [--combine-json] [--nthreads <number_of_threads>]
Example (the module automatically detects if its a folder of PDFs or JSONs):
rewe-ebon-parser examples/eBons/ --csv-table
Note: the module will fail if the folder contains both JSON and PDF files to avoid duplicating the same data.
--file
: Explicitly specify if the input and output paths are files.--folder
: Explicitly specify if the input and output paths are folders.--nthreads
: Number of concurrent threads to use for processing files.--rawtext-file
: Output raw text extracted from the PDF files to .txt files (mostly for debugging).--rawtext-stdout
: Print raw text extracted from the PDF files to the console (mostly for debugging).--csv-table
: Output parsed data as a CSV table.--version
: show module version.-h
, --help
: show help.If neither --file
nor --folder
is specified, the script will automatically detect if the input path is a file or a folder and process accordingly.
output_json_path
is not specified for a single file, the output will be saved in the same directory as the input file with a .json
extension.output_folder
is not specified for a folder, a subfolder named rewe_json_out
will be created in the input folder, and the output JSON files will be saved there.A detailed log of processing results will be saved in the output folder as processing_log.csv
, containing information on which files were successfully processed and which failed, along with error messages if any.
from rewe_ebon_parser.parse import parse_pdf_ebon
parse_pdf_ebon("examples/eBons/1.pdf")
from rewe_ebon_parser.parse import parse_ebon
# here the function is once again getting the data from a file,
# but input can come from anywhere
def process_pdf(pdf_path):
with open(pdf_path, 'rb') as f:
data = f.read()
result = parse_ebon(data)
return result
process_pdf("examples/eBons/1.pdf")
This project is licensed under the MIT License. For details see LICENSE file.
So far the module reliably parses the items, but sometimes fails on PAYBACK points, as these are often presented differently in REWE receipts.
FAQs
Parse and convert REWE eBons (digital receipts) to JSON and CSV.
We found that rewe-ebon-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.