Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

huffnpuff

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

huffnpuff

  • 0.1.1
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Huffnpuff

Huffnpuff implements a general purpose Huffman Coding system that uses a construction string which can implement any system. This also includes the bitfifo gem which allows variable length bit sequences to address the internal table of the Huffnpuff object instance. Huffman coding is essential for compression techniques such as MPEG, and PKZIP.

Installation

Add this line to your application's Gemfile:

gem 'huffnpuff'

And then execute:

$ bundle

Or install it yourself as:

$ gem install huffnpuff

Usage

Huffman coding systems are widely use on compression engines such as MPEG and PKZIP. The theory suggests that symbols that repeat can be represented by fewer bits if we sacrifice rare symbols with larger bit sequences. If you look at the game of Scrabble, some letters have higher points than others because they don't appear as often. Morse Code uses variable length dits and dahs to represent letters, but is not a valid Huffman system as you need gaps between letters to separate them. A valid Huffman system allows for a stream of data to be able to deterministically separate each token (group of bits). Below is an example of a valid system:

0       # 0 
1000    # 1
1001    # 2
1010    # 3
1011    # 4
11000   # 5
11001   # 6
11010   # 7
11011   # 8
111     # 9

The above system has an order of 10, meaning it can represent or index 10 items. This system can be represented by the the string: "0:10xx:110xx:111". The string represents 4 bins where each bin is separated by a colon. The bin's preamble is defined as the sequence of bits before an 'x' is encountered. It is the preamble that makes it possible to separate tokens. A valid string may only include the following characters: '0', '1', 'x', ':'. The character 'x' may only appear after the preamble; also, '1' or '0' may not appear after 'x'. The character 'x' represents two possible bit values of '0' and '1'. Consecutive 'xx' means there are 4 possible values: '00', '01', '10', '11'. Each consecutive 'x' doubles the possible values. In order for a system to be valid, a preamble may not overlap another preamble. For example, the string "0:10xxx:101xx" is invalid because the bit sequence '10101' would be found in more than one bin.

Coming up with a system can be tricky. The best way to come up with a system is to create a histogram of your data set, then sort it in descending order. An evenly distributed histogram means that Huffman encoding will not be effective. If there are large differences, then Huffman encoding will be very effective. The Huffnpuff gem has a utility method that can generate the system string needed to create a system. This is best seen by example as follows:

require 'huffnpuff'

hist = [200,50,35,27,22,15,13,12,9,8,8,8,8,8,7,6,5,5,4,4,4,3,3,3,3,3,2,2,2,1,1,1,1,1,1,1,1]
tight  = Huffnpuff.get_system_string_from_histogram(hist, 0.20)
medium = Huffnpuff.get_system_string_from_histogram(hist, 0.50) 
loose  = Huffnpuff.get_system_string_from_histogram(hist, 0.75)

# tight  == "0x:10xxxx:1110xx:110xxxx"
# medium == "0:10xx:11110:110xx:1110xxxx:111110xx:1111110xxx"
# loose  == "0:10:110x:1110:1111110:11110xx:111110xxx:11111110xx:111111110xx:1111111110xx:11111111110xxx"

The first parameter is our histogram array sorted in descending order. The second parameter is the grouping factor which ranges from 0.1 to 0.9 inclusively. The size of the array is equal to the number of elements we wish to represent with a Huffman code. The first element of the hist array is the most likely to occur token, and is encoded first at the beginning of our system string. Note that a system string need not to have its bins numerically sorted. The algorithm sorts based on the length of bits needed for the bin. A Huffnpuff object has an internal lookup table which maps the tokens to an entry in the table. This is best seen by example as follows:

bits = Bitfifo.new
lut  = ['Apple', 'Banana', 'Cherry', 'Dog', 'Elephant', 'Fortune', 'Grape']
huff = Huffnpuff.new(loose, bits, lut)
huff.push 'House'  # or lut.push 'House' ... same thing

# prime the pump
bits.put 0, 1          # bin 0
bits.put 0b10, 2       # bin 1
bits.put 0b1100, 4     # bin 2 x==0
bits.put 0b1101, 4     # bin 2 x==1
bits.put 0b1110, 4     # bin 3
bits.put 0b1111110, 7  # bin 4
bits.put 0b1111000, 7  # bin 5 xx==00
bits.put 0b1111001, 7  # bin 5 xx==01

bits.size   # 36  ... number of bits in bitfifo

# binary representation of bits in fifo
bits.to_i.to_s(2)   # "10110011011110111111011110001111001"

# now get the data
huff.get   #  'Apple'
huff.get   #  'Banana'
huff.get   #  'Cherry'
huff.get   #  'Dog'
huff.get   #  'Elephant'
huff.get   #  'Fortune'
huff.get   #  'Grape'
huff.get   #  'House'
huff.get   # nil, >>> no address found in Bitfifo 'bits'
bits.put 0b1100, 4  # random access
huff.get   #  'Cherry' 

Huffnpuff has the following methods:

# class methods: 
# Huffnpuff.histogram_array_valid?(ary)  ... ensures histogram descending integers
# Huffnpuff.validate_system_string(str)  ... checks for syntax errors on system string
# Huffnpuff.get_system_string_from_histogram(ary, rat = 0.61) ... creates system string  0.1 <= rat <= 0.9
# Huffnpuff.new(sys_string, bitfifo, array=[])   ... creates new object

# class constants:
# Huffnpuff::CMP                         ... proc used to sort bins

# instance methods:
# huff.min    ... returns minimum number of bits needed to access data
# huff.max    ... returns maximum number of bits needed to access data
# huff.size   ... returns maximum number of addresses that may be used to look up data
# huff.sys    ... returns system string
# huff.systemize!(sys)  ... recreates system from system string, or replace existing system
# huff.push   ... adds data to internal lookup table, raises error if no room is available
# huff.push_each        ... adds elements of array or characters of a string to lookup table
# huff.get    ... returns data indexed by external bitfifo

Revision History

Version 0.1.1

  • fixed some error messages
  • fixed systemize!
  • more test cases

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

Contributing

I need to control this for the time being.

License

The gem is available as open source under the terms of the MIT License.

FAQs

Package last updated on 25 Jan 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc