=============
Python-Blosc2
A Python wrapper for the extremely fast Blosc2 compression library
:Author: The Blosc development team
:Contact: blosc@blosc.org
:Github: https://github.com/Blosc/python-blosc2
:Actions: |actions|
:PyPi: |version|
:NumFOCUS: |numfocus|
:Code of Conduct: |Contributor Covenant|
.. |version| image:: https://img.shields.io/pypi/v/blosc2.svg
:target: https://pypi.python.org/pypi/blosc2
.. |Contributor Covenant| image:: https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg
:target: https://github.com/Blosc/community/blob/master/code_of_conduct.md
.. |numfocus| image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A
:target: https://numfocus.org
.. |actions| image:: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml/badge.svg
:target: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml
What it is
C-Blosc2 <https://github.com/Blosc/c-blosc2>
_ is the new major version of
C-Blosc <https://github.com/Blosc/c-blosc>
_, and is backward compatible with
both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package
that wraps C-Blosc2, the newest version of the Blosc compressor.
Currently Python-Blosc2 already reproduces the API of
Python-Blosc <https://github.com/Blosc/python-blosc>
, so it can be
used as a drop-in replacement. However, there are a few exceptions for a full compatibility. <https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md#changes-from-python-blosc-to-python-blosc2>
In addition, Python-Blosc2 aims to leverage the full C-Blosc2 functionality to support
super-chunks (SChunk <https://www.blosc.org/python-blosc2/reference/schunk_api.html>
),
multi-dimensional arrays
(NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>
),
metadata, serialization and other bells and whistles introduced in C-Blosc2.
Note: Python-Blosc2 is meant to be backward compatible with Python-Blosc data.
That means that it can read data generated with Python-Blosc, but the opposite
is not true (i.e. there is no forward compatibility).
SChunk: a 64-bit compressed store
A SChunk <https://www.blosc.org/python-blosc2/reference/schunk_api.html>
_ is a simple data
container that handles setting, expanding and getting
data and metadata. Contrarily to chunks, a super-chunk can update and resize the data
that it contains, supports user metadata, and it does not have the 2 GB storage limitation.
Additionally, you can convert a SChunk into a contiguous, serialized buffer (aka
cframe <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>
_)
and vice-versa; as a bonus, the serialization/deserialization process also works with NumPy
arrays and PyTorch/TensorFlow tensors at a blazing speed:
.. |compress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-compress.png?raw=true
:width: 100%
:alt: Compression speed for different codecs
.. |decompress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-decompress.png?raw=true
:width: 100%
:alt: Decompression speed for different codecs
+----------------+---------------+
| |compress| | |decompress| |
+----------------+---------------+
while reaching excellent compression ratios:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/pack-array-cratios.png?raw=true
:width: 75%
:align: center
:alt: Compression ratio for different codecs
Also, if you are a Mac M1/M2 owner, make you a favor and use its native arm64 arch (yes, we are
distributing Mac arm64 wheels too; you are welcome ;-):
.. |pack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-pack.png?raw=true
:width: 100%
:alt: Compression speed for different codecs on Apple M1
.. |unpack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-unpack.png?raw=true
:width: 100%
:alt: Decompression speed for different codecs on Apple M1
+------------+--------------+
| |pack_arm| | |unpack_arm| |
+------------+--------------+
Read more about SChunk
features in our blog entry at: https://www.blosc.org/posts/python-blosc2-improvements
NDArray: an N-Dimensional store
One of the latest and more exciting additions in Python-Blosc2 is the
NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>
_ object.
It can write and read n-dimensional datasets in an extremely efficient way thanks
to a n-dim 2-level partitioning, allowing to slice and dice arbitrary large and
compressed data in a more fine-grained way:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true
:width: 75%
To wet you appetite, here it is how the NDArray
object performs on getting slices
orthogonal to the different axis of a 4-dim dataset:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true
:width: 75%
We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro
We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful <https://www.youtube.com/watch?v=LvP9zxMGBng>
_:
.. image:: https://github.com/Blosc/blogsite/blob/master/files/images/slicing-pineapple-style.png?raw=true
:width: 50%
:alt: Slicing a dataset in pineapple-style
:target: https://www.youtube.com/watch?v=LvP9zxMGBng
Installing
Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms.
You can install binary packages from PyPi using pip
:
.. code-block:: console
pip install blosc2
Documentation
The documentation is here:
https://blosc.org/python-blosc2/python-blosc2.html
Also, some examples are available on:
https://github.com/Blosc/python-blosc2/tree/main/examples
Building from sources
python-blosc2
comes with the C-Blosc2 sources with it and can be built in-place:
.. code-block:: console
git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
git submodule update --init --recursive
python -m pip install -r requirements-build.txt
python setup.py build_ext --inplace
That's all. You can proceed with testing section now.
Testing
After compiling, you can quickly check that the package is sane by
running the tests:
.. code-block:: console
python -m pip install -r requirements-tests.txt
python -m pytest (add -v for verbose mode)
Benchmarking
If curious, you may want to run a small benchmark that compares a plain
NumPy array copy against compression through different compressors in
your Blosc build:
.. code-block:: console
PYTHONPATH=. python bench/pack_compress.py
License
The software is licenses under a 3-Clause BSD license. A copy of the
python-blosc2 license can be found in LICENSE.txt <https://github.com/Blosc/python-blosc2/tree/main/LICENSE.txt>
_.
Mailing list
Discussion about this module is welcome in the Blosc list:
blosc@googlegroups.com
https://groups.google.es/group/blosc
Please follow @Blosc2 <https://twitter.com/Blosc2>
_ to get informed about the latest developments.
Citing Blosc
You can cite our work on the different libraries under the Blosc umbrella as:
.. code-block:: console
@ONLINE{blosc,
author = {{Blosc Development Team}},
title = "{A fast, compressed and persistent data store library}",
year = {2009-2023},
note = {https://blosc.org}
}
Enjoy!