read-protobuf
Small library to read serialized protobuf(s) directly into Pandas DataFrame.
This is intended to be a simple shortcut for translating serialized
protobuf bytes / files directly to a dataframe.
Install
Available via pip:
$ pip install read-protobuf
Usage
Run the demo-notebook for an interactive demo.
import demo_pb2
from read_protobuf import read_protobuf
MessageType = demo_pb2.MessageType()
df = read_protobuf(b'\x00\x00', MessageType)
df = read_protobuf([b'\x00\x00', b'x00\x00'] MessageType)
df = read_protobuf('demo.pb', MessageType)
df = read_protobuf(['demo.pb', 'demo2.pb'], MessageType)
df = read_protobuf('demo.pb', MessageType, flatten=False)
df = read_protobuf('demo.pb', MessageType, prefix_nested=True)
To compile a protobuf Message class from python, use:
$ protoc --python_out="." demo.proto
Alternatives
protobuf-to-dict
https://github.com/benhodgson/protobuf-to-dict
This library was developed earlier to convert protobufs to JSON via a dict.
MessageToDict, MessageToJson
The google protobuf library comes with utilities to convert messages to a dict
or JSON,
then loaded by Pandas.
from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import MessageToDict
In brief tests, the read_protobuf
package is about 2x as fast
as using MessageToDict
and 3x as fast as MessageToJson
.
Develop
To install a development version of the package, run from the root directory:
$ pip install -e .
- To install development dependencies, use the optional
[dev]
dependencies:
$ pip install -e ".[dev]"
Format
Uses black
and isort
to format files.
$ make black
$ make isort
Lint
Uses ruff
to lint application.
$ make ruff
Test
Uses pytest
to run unit tests. From the root of the repository, run:
$ make pytest
$ pytest -k "TestRead::test_read_bytes"
Code Coverage
Use coverage
to monitor code coverage during tests.
To record coverage while running tests, run:
$ make pytest-cov
License
MIT License