
Security News
The Nightmare Before Deployment
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.
gorillacompression
Advanced tools
This is an implementation (with some adaptations) of the compression algorithm described in section 4.1 (Time series compression) of [1] (read the paper here).
Gorilla compression is lossless.
This library can be used in three ways:
In all three cases, the result of the encoding process is a dict with everything necessary for decoding (see Usage for examples). If you want to use this library for compressed message exchanges, you can serialize the result of the encoding process as you like (JSON, protobuf, etc.)
This implementation is based on section 4.1 of [1] and on the Facebook's open source implementation [2] (which have some differences).
To install the latest release:
$ pip install gorillacompression
You can also build a local package and install it:
$ make build
$ pip install dist/*.whl
Import gorillacompression module.
>>> import gorillacompression as gc
Data to encode.
>>> timestamps = [1628164645, 1628164649, 1628164656, 1628164669]
>>> values = [18.95, 18.91, 17.01, 14.05]
>>> pairs = list(zip(timestamps, values))
>>> pairs
[(1628164645, 18.95), (1628164649, 18.91), (1628164656, 17.01), (1628164669, 14.05)]
In the three scenarios of compression (timestamps, values, pairs), you can use:
encode_all to encode all elements or encode_next to encode element by element.decode_all to decode everything.encode_next returns True if the element has been encoded correctly, False if the element has not been encoded accompanied by a warning explaining the reason.
The expected input timestamp is a POSIX timestamp less than 2147483647 ('January 19, 2038 04:14:07'). The delta between two successive timestamps must be greater than or equal to 0.
You can use encode_all to encode all timestamps:
>>> content = gc.TimestampsEncoder.encode_all(timestamps)
>>> content
{'encoded': b'\xc2\x17\xa4K\x08\xa1Q@', 'nb_timestamps': 4}
>>> gc.TimestampsDecoder.decode_all(content)
[1628164645, 1628164649, 1628164656, 1628164669]
Or you can use encode_next to encode one by one:
>>> ts_encoder = gc.TimestampsEncoder()
>>> for ts in timestamps:
... ts_encoder.encode_next(ts)
>>> content = ts_encoder.get_encoded()
>>> content
{'encoded': b'\xc2\x17\xa4K\x08\xa1Q@', 'nb_timestamps': 4}
>>> gc.TimestampsDecoder.decode_all(content)
[1628164645, 1628164649, 1628164656, 1628164669]
You can use encode_all to encode all values:
>>> content = gc.ValuesEncoder.encode_all(values)
>>> content
{'encoded': b'@2\xf333333\xe7f\xf1\xbco\x1b\xc6\xee\xc7\xeaz\x9e\xa7\xa9\xeb\xaf^\x8d\x8bb\xd8\xb6,\x80', 'nb_values': 4, 'float_format': 'f64'}
>>> gc.ValuesDecoder.decode_all(content)
[18.95, 18.91, 17.01, 14.05]
Or you can use encode_next to encode one by one:
>>> values_encoder = gc.ValuesEncoder()
>>> for v in values:
... values_encoder.encode_next(v)
>>> content = values_encoder.get_encoded()
>>> content
{'encoded': b'@2\xf333333\xe7f\xf1\xbco\x1b\xc6\xee\xc7\xeaz\x9e\xa7\xa9\xeb\xaf^\x8d\x8bb\xd8\xb6,\x80', 'nb_values': 4, 'float_format': 'f64'}
>>> gc.ValuesDecoder.decode_all(content)
[18.95, 18.91, 17.01, 14.05]
You can use encode_all to encode all pairs:
>>> content = gc.PairsEncoder.encode_all(pairs)
>>> content
{'encoded': b'\xc2\x17\xa4J\x80e\xe6ffffg\x08\xe7f\xf1\xbco\x1b\xc6\xd0\xb7c\xf5=OS\xd4\xf5\xa2\xeb\xd7\xa3b\xd8\xb6-\x8b ', 'nb_pairs': 4, 'float_format': 'f64'}
>>> gc.PairsDecoder.decode_all(content)
[(1628164645, 18.95), (1628164649, 18.91), (1628164656, 17.01), (1628164669, 14.05)]
Or you can use encode_next to encode one by one:
>>> pairs_encoder = gc.PairsEncoder()
>>> for (ts, v) in pairs:
... pairs_encoder.encode_next(ts, v)
>>> content = pairs_encoder.get_encoded()
>>> content
{'encoded': b'\xc2\x17\xa4J\x80e\xe6ffffg\x08\xe7f\xf1\xbco\x1b\xc6\xd0\xb7c\xf5=OS\xd4\xf5\xa2\xeb\xd7\xa3b\xd8\xb6-\x8b ', 'nb_pairs': 4, 'float_format': 'f64'}
>>> gc.PairsDecoder.decode_all(content)
[(1628164645, 18.95), (1628164649, 18.91), (1628164656, 17.01), (1628164669, 14.05)]
Below is a brief explanation of the implemented method. (Refer to [1] section 4.1 (read the paper here) for the original explanation)
(a) Calculate the delta of delta
D = (t_n − t_(n−1)) − (t_(n−1) − t_(n−2))
(b) If D is zero, then store a single ‘0’ bit
(c) If D is between [-63, 64], store ‘10’ followed by the value (7 bits)
(d) If D is between [-255, 256], store ‘110’ followed by the value (9 bits)
(e) if D is between [-2047, 2048], store ‘1110’ followed by the value (12 bits)
(f) Otherwise store ‘1111’ followed by D using 32 bits
Notation
n bits:
+---- n ----+
| |
+---- n' ---+
n bytes:
+==== n ====+
| |
+==== n' ===+
`~` in place of `n` means a variable number of bytes or bits.
When it makes sense, n refers to the default value, and n' to the variable containing the value.
This explanation corresponds to the case of float format f64, for the other formats (f32, f16), the size of some fields is different (refer to the code for more details).
+======================= 8 =======================+
| First value (IEEE 754, binary64, Big Endian) |
+======================= 8 =======================+
+-- 1 --+
| 0 |
+-- 1 --+
+--- 2 ---+--- length of the meaningful XORed value ---+
| 10 | [meaningful XORed value] |
+--- 2 ---+--- length of the meaningful XORed value ---+
+--- 2 ---+------------- 5 -------------+------------------- 6 ------------------+--- length of the meaningful XORed value ---+
| 11 | number of leading zeros | length of the meaningful XORed value | [meaningful XORed value] |
+--- 2 ---+------------- 5 -------------+------------------- 6 ------------------+--- length of the meaningful XORed value ---+
+---- n ----+
| 0...0 |
+---- n ----+
n < 8
(*) The terms "meaningful bits" and "meaningful XORed value" used in the original paper may be confusing.
The encoding of a pair is the encoding of the timestmap followed by the encoding of the value.
Please, open issues. PR are very welcome!
$ git clone https://github.com/ghilesmeddour/gorilla-time-series-compression.git
$ cd gorilla-time-series-compression
make test
make cov
[1] Pelkonen, T., Franklin, S., Teller, J., Cavallaro, P., Huang, Q., Meza, J., & Veeraraghavan, K. (2015). Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12), 1816-1827.
FAQs
Python implementation of Gorilla time series compression
We found that gorillacompression demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.

Research
/Security News
Impostor NuGet package Tracer.Fody.NLog typosquats Tracer.Fody and its author, using homoglyph tricks, and exfiltrates Stratis wallet JSON/passwords to a Russian IP address.

Security News
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.