
Security News
ESLint Adds Official Support for Linting HTML
ESLint now supports HTML linting with 48 new rules, expanding its language plugin system to cover more of the modern web development stack.
Sorted containers with state-of-the-art query performance and compressed memory usage
PyGM is a Python library that enables fast query operations on sorted lists of numbers (like integers and floats) with a tiny memory overhead. Internally, it features the PGM-index, a state-of-the-art learned data structure that robustly scales to billions of elements in just a few tens of megabytes.
Install with pip:
pip install pygm
PyGM supports both standard and other useful list and set operations:
>>> from pygm import SortedList, SortedSet
>>> sl = SortedList([0, 1, 34, 144, 1, 55, 233, 2, 3, 21, 89, 5, 8, 13])
>>> sl
SortedList([0, 1, 1, ..., 144, 233])
>>> sl.find_gt(9) # smallest element > 9
13
>>> sl.count(1) # frequency of 1
2
>>> 42 in sl # membership test
False
>>> list(sl.range(0, 21, inclusive=(False, True))) # elements 0 < x <= 21
[1, 1, 2, 3, 5, 8, 13, 21]
>>> sl[2:10:3] # slicing syntax support
SortedList([1, 5, 21])
>>> (sl + [-3, -2, -1]).rank(0) # number of elements <= 0
4
>>> ss = SortedSet([1, 2, 3, 4]) ^ {3, 4, 5} # set symmetric difference
>>> ss.find_lt(5)
2
The full documentation is available online and in the Python interpreter via the help()
built-in function.
PyGM is compatible with Python 3.3+. The easiest way to install it is through PyPI:
pip install pygm
Otherwise, you can clone the repo, build it from source and install it as follows:
if [[ "$(uname)" == "Darwin" ]]; then brew install libomp; fi
git clone https://github.com/gvinciguerra/PyGM.git
cd PyGM
git submodule update --init --recursive
pip install .
Remember to leave the source directory PyGM/
and its parent before running Python.
Here are some plots that compare the performance of PyGM with two popular libraries, sortedcontainers and blist, on synthetic data.
All the graphs are log-log and show, for increasing data sizes, the average time it takes to perform each operation (lower is better). In particular, the __init__
plot shows the construction time, __contains__
measures membership queries, and __getitem__
shows the cost of accessing an element given its position.
The interesting operations on sorted lists are: (i) index
, which returns the position of the first occurrence of a given element; (ii) count
, which returns the number of occurrences of a given element; (iii) bisect_left
, which returns the insertion point for a given value in the list to maintain the sorted order (and is used to implement find_[ge|gt|le|lt]
).
You can run and plot the experiments on your computer and your data with the notebook in tests/benchmark.ipynb.
This project is licensed under the terms of the Apache License 2.0.
If you use the library in an academic setting, please cite the following paper:
Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.
@article{Ferragina:2020pgm,
Author = {Paolo Ferragina and Giorgio Vinciguerra},
Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
Year = {2020},
Volume = {13},
Number = {8},
Pages = {1162--1175},
Doi = {10.14778/3389133.3389135},
Url = {https://pgm.di.unipi.it},
Issn = {2150-8097},
Journal = {{PVLDB}}}
FAQs
Sorted containers with state-of-the-art query performance and compressed memory usage
We found that pygm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ESLint now supports HTML linting with 48 new rules, expanding its language plugin system to cover more of the modern web development stack.
Security News
CISA is discontinuing official RSS support for KEV and cybersecurity alerts, shifting updates to email and social media, disrupting automation workflows.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.