Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Kohlrahbi generates machine-readable files from AHB documents. Kohlrahbi's sister is MIG_mose.
If you're looking for a tool to process the official BDEW XMLs for AHBs (available since 2024), checkout fundamend.
German utilities exchange data using EDIFACT; This is called market communication (mako).
The Forum Datenformate of the BDEW publishes the technical regulations of the EDIFACT based market communication on edi-energy.de
.
These rules are not stable but change twice a year (in theory) or few times per year (in reality).
Specific rules, which are binding for every German utility are kind of formalised in so called "Anwendungshandbüchern" (AHB). Those AHBs are basically long tables that describe:
As a utility, if I want to exchange data about business process XYZ with a market partner, then I have to provide the following information: [...]
In total the regulations from these Anwendungshandbücher span several thousand pages. And by pages, we really mean pages. EDIFACT communication is basically the API between German utilities for most of their B2B processes. However, the technical specifications of this API are
The Anwendungshandbücher are the epitome of digitization with some good intentions.
Although the AHBs are publicly available as PDF or Word files on edi-energy.de
, they are hardly accessible in a technical sense:
The root cause for all these inaccessibility is a technical one: Information that are theoretically structured are published in an unstructured format (PDF or Word), which is not suited for technical specifications in IT.
KohlrAHBi as a tool helps you to break those chains and access the AHBs as you'd expect it from technical specs: easy and automatically instead of with hours of mindless manual work.
KohlrAHBi takes the .docx
files published by edi-energy.de
as an input and returns truly machine-readable data in a variety of formats (JSON, CSV...) as a result.
Hence, KohlrAHBi is the key for unlocking any automation potential that is reliant on information hidden in the Anwendungshandbücher.
We're all hoping for the day of true digitization on which this repository will become obsolete.
Kohlrahbi is a Python based tool. Therefore you have to make sure, that Python is running on your machine.
We recommend to use virtual environments to keep your system clean.
Create a new virtual environment with
python -m venv .venv
The activation of the virtual environment depends on your used OS.
.venv\Scripts\activate
source .venv/bin/activate
Finally, install the package with
pip install kohlrahbi
Kohlrahbi is a command line tool. You can use it in three different ways:
You can run the following command to get an overview of all available commands and options.
kohlrahbi --help
[!NOTE] For the following steps we assume that you cloned our edi_energy_mirror to a neighbouring directory. The
edi_energy_mirror
contains the.docx
files of the AHBs. The folder structure should look like this:. ├── edi_energy_mirror └── kohlrahbi
To extract the all AHB tables for each pruefi of a specific format version, you can run the following command.
kohlrahbi ahb --edi-energy-mirror-path ../edi_energy_mirror/ --output-path ./output/ --file-type csv --format-version FV2310
To extract the AHB tables for a specific pruefi of a specific format version, you can run the following command.
kohlrahbi ahb -eemp ../edi_energy_mirror/ --output-path ./output/ --file-type csv --pruefis 13002 --format-version FV2310
You can also provide multiple pruefis.
kohlrahbi ahb -eemp ../edi_energy_mirror/ --output-path ./output/ --file-type csv --pruefis 13002 --pruefis 13003 --pruefis 13005 --format-version FV2310
And you can also provide multiple file types.
kohlrahbi ahb -eemp ../edi_energy_mirror/ --output-path ./output/ --file-type csv --file-type xlsx --file-type flatahb --pruefis 13002 --format-version FV2310
To extract all conditions for each format of a specific format version, you can run the following command.
kohlrahbi conditions -eemp ../edi_energy_mirror/ --output-path ./output/ --format-version FV2310
This will provide you with:
found in all AHBs (including the condition texts from package tables) within the specified folder with the .docx files.
The output will be saved for each Edifact format separately as conditions.json
and packages.json
in the specified output path.
Please note that the information regarding the conditions collected here may more comprehensive compared to the information collected for the AHBs above. This is because conditions
uses a different routine than ahb
.
kohlrahbi changehistory -eemp ../edi_energy_mirror/ --output-path ./output/ --format-version FV2310
.docx
Data Sourceskohlrahbi internally relies on a specific naming schema of the .docx
files in which the file name holds information about the edifact format and validity period of the AHBs contained within the file.
The easiest way to be compliant with this naming schema is to clone our edi_energy_mirror repository to your localhost.
There is a kohlrahbi based CI pipeline from the edi_energy_mirror mentioned above to the repository machine-readable_anwendungshandbuecher where you can find scraped AHBs as JSON, CSV or Excel files.
flowchart TB
S[Start] --> RD[Read docx]
RD --> RPT[Read all paragraphs <br> and tables]
RPT --> I[Start iterating]
I --> NI[Read next item]
%% check for text paragraph %%
NI --> CTP{Text Paragraph?}
CTP -- Yes --> NI
CTP -- No --> CCST{Is item just<br>Chapter or Section Title?}
CCST -- Yes --> CTAenderunghistorie{Is Chapter Title<br>'Änderungshistorie'?}
CTAenderunghistorie -- Yes --> EXPORT[Export Extract]
CCST -- No --> CT{Is item a table<br>with prüfis?}
CT -- Yes --> Extract[Create Extract]
The following table shows the page number of the AHBs for each format of the format version FV2310.
Format | Page number | Hint | |
---|---|---|---|
UTILMD Strom | 1064 | ||
UTILMD Gas | 345 | ||
REQOTE | 264 | together with QUOTES, ORDERS, ORDRSP, ORDCHG | |
QUOTES | 264 | together with REQOTE, ORDERS, ORDRSP, ORDCHG | |
ORDRSP | 264 | together with REQOTE, QUOTES, ORDERS, ORDCHG | |
ORDERS | 264 | together with REQOTE, QUOTES, ORDRSP, ORDCHG | |
ORDCHG | 264 | together with REQOTE, QUOTES, ORDERS, ORDRSP | |
MSCONS | 164 | ||
UTILMD MaBis | 133 | ||
REMADV | 91 | together with INVOIC | |
INVOIC | 91 | together with REMADV | |
IFTSTA | 82 | ||
CONTRL | 72 | together with APERAK, contains no Prüfis | |
APERAK | 72 | together with CONTRL, contains no Prüfis | |
PARTIN | 69 | ||
UTILTS | 34 | ||
ORDRSP | 30 | together with ORDERS | |
ORDERS | 30 | together with ORDRSP | |
PRICAT | 25 | ||
COMDIS | 10 | good test for tables which are above change history |
To set up the development environment, you have to install the dev dependencies.
tox -e dev
To run the tests, you can use tox.
tox
See our Python Template Repository for detailed explanations.
You are very welcome to contribute to this template repository by opening a pull request against the main branch.
This repository is part of the Hochfrequenz Libraries and Tools for a truly digitized market communication.
FAQs
Tool to generate machine readable files from AHB documents
We found that kohlrahbi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.