A fast and correct bencode serialize/deserialize library
introduction
Why yet another bencode package in python?
because I need a bencode library:
1. Correct
It should fully validate its inputs, both encoded bencode bytes, or python object to be
encoded.
And it should not decode bencode bytes to str
by default.
Bencode doesn't have a utf-8 str type, only bytes,
so many decoder try to decode bytes to str and fallback to bytes,
this package won't, it parse bencode bytes value as python bytes.
It may be attempting to parse all dictionary keys as string,
but for BitTorrent v2 torrent, the keys in pieces root
dictionary is still sha256 hash
instead of ascii/utf-8 string.
If you prefer string as dictionary keys, write a dedicated function to convert parsing
result.
Also be careful! Even file name or torrent name may not be valid utf-8 string.
2. Fast enough
this package is written with c++ in CPython.
3. still cross implement
This package sill have a pure python wheel bencode2-${version}-py3-none-any.whl
wheel
on pypi.
Which means you can still use it in non-cpython python with same behavior.
install
pip install bencode2
basic usage
import bencode2
assert bencode2.bdecode(b"d4:spaml1:a1:bee") == {b"spam": [b"a", b"b"]}
assert bencode2.bencode({'hello': 'world'}) == b'd5:hello5:worlde'
Decoding
bencode type | python type |
---|
integer | int |
string | bytes |
array | list |
dictionary | dict |
bencode have 4 native types, integer, string, array and dictionary.
This package will decode integer to int
, array to list
and
dictionary to dict
.
Because bencode string is not defined as utf-8 string, and will contain raw bytes
bencode2 will decode bencode string to python bytes
.
Encoding
python type | bencode type |
---|
bool | integer 0/1 |
int , enum.IntEnum | integer |
str , enum.StrEnum | string |
bytes , bytearray ,memoryview | string |
list , tuple , NamedTuple | array |
dict , OrderedDict | dictionary |
types.MaapingProxy | dictionary |
dataclasses | dictionary |
free threading
bencode2 have a free threading wheel on pypi, build with GIL disabled.
When encoding or decoding, it will not acquire GIL and may call non-thread-safy c-api,
which mean it's the caller's responsibility to ensure thread safety.
When calling bencode
, it's safe to encode same object in multiple threading,
but it's not safe to encoding a object and change it in another thread at same time.
Also, when decoding, bytes
objects are immutable so it's safe to be used in multiple
threading,
but memoryview
and bytearray
maybe not, please make sure underlay data doesn't
change when decoding.
Development
This project use meson for building.
For testing pure python library,
make sure all so/pyd files in src/bencode2
are removed, then run
PYTHONPATH=src pytest --assert-pkg-compiled=false
.
For testing native extension, meson-python doesn't provide same function with
python setup.py build_ext --inplace
.
So you will need to run command like this:
meson setup build -Dbuildtype=release -Dpython.allow_limited_api=false
meson compile -C build
ninja -C build copy
ninja will need to build so/pyd with meson and copy it to src/bencode2
,
then run tests with PYTHONPATH=src pytest --assert-pkg-compiled=true
.