
Product
Introducing Socket Fix for Safe, Automated Dependency Upgrades
Automatically fix and test dependency updates with socket fix—a new CLI tool that turns CVE alerts into safe, automated upgrades.
.. image:: https://img.shields.io/github/actions/workflow/status/soxofaan/dahuffman/lint-and-test.yml :target: https://github.com/soxofaan/dahuffman/actions/workflows/lint-and-test.yml
.. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://github.com/soxofaan/dahuffman/blob/master/LICENSE.txt
.. image:: https://img.shields.io/pypi/v/dahuffman :target: https://pypi.org/project/dahuffman
.. image:: https://img.shields.io/pypi/pyversions/dahuffman :target: https://pypi.org/project/dahuffman
.. image:: https://img.shields.io/pypi/wheel/dahuffman :target: https://pypi.org/project/dahuffman :alt: PyPI - Wheel
dahuffman is a pure Python module for Huffman encoding and decoding, commonly used for lossless data compression.
The name of the module refers to the full name of the inventor of the Huffman code tree algorithm: David Albert Huffman (August 9, 1925 – October 7, 1999).
.. code-block:: bash
pip install dahuffman
Basic usage example, where the code table is built based on given symbol frequencies::
>>> from dahuffman import HuffmanCodec
>>> codec = HuffmanCodec.from_frequencies(
... {"e": 100, "n": 20, "x": 1, "i": 40, "q": 3}
... )
>>> codec.print_code_table()
Bits Code Value Symbol
5 00000 0 _EOF
5 00001 1 'x'
4 0001 1 'q'
3 001 1 'n'
2 01 1 'i'
1 1 1 'e'
Encode a string, get the encoded data as bytes
and decode again::
>>> encoded = codec.encode("exeneeeexniqneieini")
>>> encoded
b'\x86|%\x13i@'
>>> len(encoded)
6
>>> codec.decode(encoded)
'exeneeeexniqneieini'
If desired: work with byte values directly:
>>> list(encoded)
[134, 124, 37, 19, 105, 64]
>>> codec.decode([134, 124, 37, 19, 105, 64])
'exeneeeexniqneieini'
You can also "train" the codec by providing it data directly::
>>> codec = HuffmanCodec.from_data(
... "hello world how are you doing today foo bar lorem ipsum"
... )
>>> codec.encode("do lo er ad od")
b'^O\x1a\xc4S\xab\x80'
>>> len(_)
7
Using it with sequences of symbols (country codes in this example)::
>>> countries = ["FR", "UK", "BE", "IT", "FR", "IT", "GR", "FR", "NL", "BE", "DE"]
>>> codec = HuffmanCodec.from_data(countries)
>>> encoded = codec.encode(["FR", "IT", "BE", "FR", "UK"])
>>> encoded
b'L\xca'
>>> len(encoded)
2
>>> codec.decode(encoded)
['FR', 'IT', 'BE', 'FR', 'UK']
Doing it in a streaming fashion (generators)::
>>> import random
>>> def sample(n, symbols):
... for i in range(n):
... if (n-i) % 5 == 1:
... print(i)
... yield random.choice(symbols)
...
>>> codec = HuffmanCodec.from_data(countries)
>>> encoded = codec.encode_streaming(sample(16, countries))
>>> encoded
<generator object encode_streaming at 0x108bd82d0>
>>> decoded = codec.decode_streaming(encoded)
>>> decoded
<generator object decode_streaming at 0x108bd8370>
>>> list(decoded)
0
5
10
15
['DE', 'BE', 'FR', 'GR', 'UK', 'BE', 'UK', 'IT', 'UK', 'FR', 'DE', 'IT', 'NL', 'IT', 'FR', 'UK']
Pre-trained codecs
The ``dahuffman.codecs`` package contains a bunch of pre-trained code tables.
The codecs can be loaded as follows::
>>> from dahuffman import load_shakespeare
>>> codec = load_shakespeare()
>>> codec.print_code_table()
Bits Code Value Symbol
4 0000 0 'n'
4 0001 1 's'
4 0010 2 'h'
5 00110 6 'u'
7 0011100 28 'k'
9 001110100 116 'Y'
14 00111010100000 3744 '0'
...
>>> len(codec.encode('To be, or not to be; that is the question;'))
24
FAQs
Pure Python Huffman encoder and decoder module
We found that dahuffman demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Automatically fix and test dependency updates with socket fix—a new CLI tool that turns CVE alerts into safe, automated upgrades.
Security News
CISA denies CVE funding issues amid backlash over a new CVE foundation formed by board members, raising concerns about transparency and program governance.
Product
We’re excited to announce a powerful new capability in Socket: historical data and enhanced analytics.