Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

python-hll

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

python-hll

Python library for the HyperLogLog algorithm

  • 0.1.3
  • PyPI
  • Socket score

Maintainers
1

========== python-hll

.. image:: https://img.shields.io/pypi/v/python_hll.svg :target: https://pypi.python.org/pypi/python_hll

.. image:: https://readthedocs.org/projects/python-hll/badge/?version=latest :target: https://python-hll.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

.. image:: https://img.shields.io/badge/github-python--hll-yellow :target: https://github.com/AdRoll/python-hll

A Python implementation of HyperLogLog <http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf>_ whose goal is to be storage compatible <https://github.com/aggregateknowledge/hll-storage-spec>_ with java-hll <https://github.com/aggregateknowledge/java-hll>, js-hll <https://github.com/aggregateknowledge/js-hll> and postgresql-hll <https://github.com/citusdata/postgresql-hll>_.

NOTE: This is a fairly literal translation/port of java-hll <https://github.com/aggregateknowledge/java-hll>_ to Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255). Also this implementation is quite slow: for example, in Java HLLSerializationTest takes 12 seconds to run while in Python test_hll_serialization takes 1.5 hours to run (about 400x slower).

Overview

See java-hll <https://github.com/aggregateknowledge/java-hll>_ for an overview of what HLLs are and how they work.

Usage

Hashing and adding a value to a new HLL::

from python_hll.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)

hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)

Retrieving the cardinality of an HLL::

cardinality = hll.cardinality()

Unioning two HLLs together (and retrieving the resulting cardinality)::

hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5

# ... (add values to both sets) ...

hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()

Reading an HLL from a hex representation of storage specification, v1.0.0 <https://github.com/aggregateknowledge/hll-storage-spec/blob/v1.0.0/STORAGE.md>_ (for example, retrieved from a PostgreSQL database <https://github.com/aggregateknowledge/postgresql-hll>_)::

from python_hll.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))

Writing an HLL to its hex representation of storage specification, v1.0.0 <https://github.com/aggregateknowledge/hll-storage-spec/blob/v1.0.0/STORAGE.md>_ (for example, to be inserted into a PostgreSQL database <https://github.com/aggregateknowledge/postgresql-hll>_)::

bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))

Also see the API documentation <https://python-hll.readthedocs.io/en/latest/docs/python_hll.html>_.

Development

See Contributing <https://python-hll.readthedocs.io/en/latest/contributing.html>_ for how to get started building, testing, and deploying the code.

======= History

0.0.0 (2019-06-14)

  • Submitted to AdRoll HackWeek.

0.1.0 (2019-09-12)

  • First release on PyPI.

0.1.1 (2019-09-12)

  • Add missing install_requires: numpy

0.1.2 (2019-12-12)

0.1.3 (2021-01-22)

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc