Huffnpuff
Huffnpuff
implements a general purpose Huffman Coding system that uses a construction string which can implement any system.
This also includes the bitfifo
gem which allows variable length bit sequences to address the internal table of the
Huffnpuff
object instance. Huffman coding is essential for compression techniques such as MPEG, and PKZIP.
Installation
Add this line to your application's Gemfile:
gem 'huffnpuff'
And then execute:
$ bundle
Or install it yourself as:
$ gem install huffnpuff
Usage
Huffman coding systems are widely use on compression engines such as MPEG and PKZIP.
The theory suggests that symbols that repeat can be represented by fewer bits if we sacrifice rare symbols with larger bit sequences.
If you look at the game of Scrabble, some letters have higher points than others because they don't appear as often.
Morse Code uses variable length dits and dahs to represent letters, but is not a valid Huffman system as you need gaps between letters
to separate them. A valid Huffman system allows for a stream of data to be able to deterministically separate each token (group of bits).
Below is an example of a valid system:
0 # 0
1000 # 1
1001 # 2
1010 # 3
1011 # 4
11000 # 5
11001 # 6
11010 # 7
11011 # 8
111 # 9
The above system has an order of 10, meaning it can represent or index 10 items.
This system can be represented by the the string: "0:10xx:110xx:111"
.
The string represents 4 bins where each bin is separated by a colon.
The bin's preamble is defined as the sequence of bits before an 'x'
is encountered.
It is the preamble that makes it possible to separate tokens.
A valid string may only include the following characters: '0', '1', 'x', ':'
.
The character 'x'
may only appear after the preamble; also, '1'
or '0'
may not appear after 'x'
.
The character 'x'
represents two possible bit values of '0'
and '1'
.
Consecutive 'xx'
means there are 4 possible values: '00'
, '01'
, '10'
, '11'
.
Each consecutive 'x'
doubles the possible values.
In order for a system to be valid, a preamble may not overlap another preamble.
For example, the string "0:10xxx:101xx"
is invalid because the bit sequence '10101'
would be found in more than one bin.
Coming up with a system can be tricky. The best way to come up with a system is to create a histogram of your data set,
then sort it in descending order. An evenly distributed histogram means that Huffman encoding will not be effective.
If there are large differences, then Huffman encoding will be very effective.
The Huffnpuff
gem has a utility method that can generate the system string needed to create a system.
This is best seen by example as follows:
require 'huffnpuff'
hist = [200,50,35,27,22,15,13,12,9,8,8,8,8,8,7,6,5,5,4,4,4,3,3,3,3,3,2,2,2,1,1,1,1,1,1,1,1]
tight = Huffnpuff.get_system_string_from_histogram(hist, 0.20)
medium = Huffnpuff.get_system_string_from_histogram(hist, 0.50)
loose = Huffnpuff.get_system_string_from_histogram(hist, 0.75)
The first parameter is our histogram array sorted in descending order.
The second parameter is the grouping factor which ranges from 0.1 to 0.9 inclusively.
The size of the array is equal to the number of elements we wish to represent with a Huffman code.
The first element of the hist
array is the most likely to occur token, and is encoded first at the beginning of our system string.
Note that a system string need not to have its bins numerically sorted.
The algorithm sorts based on the length of bits needed for the bin.
A Huffnpuff
object has an internal lookup table which maps the tokens to an entry in the table.
This is best seen by example as follows:
bits = Bitfifo.new
lut = ['Apple', 'Banana', 'Cherry', 'Dog', 'Elephant', 'Fortune', 'Grape']
huff = Huffnpuff.new(loose, bits, lut)
huff.push 'House'
bits.put 0, 1
bits.put 0b10, 2
bits.put 0b1100, 4
bits.put 0b1101, 4
bits.put 0b1110, 4
bits.put 0b1111110, 7
bits.put 0b1111000, 7
bits.put 0b1111001, 7
bits.size
bits.to_i.to_s(2)
huff.get
huff.get
huff.get
huff.get
huff.get
huff.get
huff.get
huff.get
huff.get
bits.put 0b1100, 4
huff.get
Huffnpuff has the following methods:
Revision History
Version 0.1.1
- fixed some error messages
- fixed systemize!
- more test cases
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
.
Contributing
I need to control this for the time being.
License
The gem is available as open source under the terms of the MIT License.