Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Python package to convert numerical series & numpy arrays into compressed strings
Simple way to compress and decompress numerical series & numpy arrays.
Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack. After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don't care about conversion to string for transmitting or storing purpose.
pip install numcompress
from numcompress import compress, decompress
# Integers
>>> compress([14578, 12759, 13525])
'B_twxZnv_nB_bwm@'
>>> decompress('B_twxZnv_nB_bwm@')
[14578.0, 12759.0, 13525.0]
# Floats - lossless compression
# precision argument specifies how many decimal points to preserve, defaults to 3
>>> compress([145.7834, 127.5989, 135.2569], precision=4)
'Csi~wAhdbJgqtC'
>>> decompress('Csi~wAhdbJgqtC')
[145.7834, 127.5989, 135.2569]
# Floats - lossy compression
>>> compress([145.7834, 127.5989, 135.2569], precision=2)
'Acn[rpB{n@'
>>> decompress('Acn[rpB{n@')
[145.78, 127.6, 135.26]
# compressing and decompressing numpy arrays
>>> from numcompress import compress_ndarray, decompress_ndarray
>>> import numpy as np
>>> series = np.random.randint(1, 100, 25).reshape(5, 5)
>>> compressed_series = compress_ndarray(series)
>>> decompressed_series = decompress_ndarray(compressed_series)
>>> series
array([[29, 95, 10, 48, 20],
[60, 98, 73, 96, 71],
[95, 59, 8, 6, 17],
[ 5, 12, 69, 65, 52],
[84, 6, 83, 20, 50]])
>>> compressed_series
'5*5,Bosw@_|_Cn_eD_fiA~tu@_cmA_fiAnyo@o|k@nyo@_{m@~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`zB_ry@'
>>> decompressed_series
array([[29., 95., 10., 48., 20.],
[60., 98., 73., 96., 71.],
[95., 59., 8., 6., 17.],
[ 5., 12., 69., 65., 52.],
[84., 6., 83., 20., 50.]])
>>> (series == decompressed_series).all()
True
Test | # of Numbers | Compression ratio |
---|---|---|
Integers | 10k | 91.14% |
Floats | 10k | 81.35% |
You can run the test suite with -s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.
pytest -s
Here's a quick example showing compression ratio:
>>> series = random.sample(range(1, 100000), 50000) # generate 50k random numbers between 1 and 100k
>>> text = compress(series) # apply compression
>>> original_size = sum(sys.getsizeof(i) for i in series)
>>> original_size
1200000
>>> compressed_size = sys.getsizeof(text)
>>> compressed_size
284092
>>> compression_ratio = ((original_size - compressed_size) * 100.0) / original_size
>>> compression_ratio
76.32566666666666
We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.
If you see any problem, open an issue or send a pull request. You can write to me at amit.juschill@gmail.com
FAQs
Python package to convert numerical series & numpy arrays into compressed strings
We found that numcompress demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.