Jenkspy: Fast Fisher-Jenks breaks for Python
Compute "natural breaks" (Fisher-Jenks algorithm) on list / tuple / array / numpy.ndarray of integers/floats.
The algorithm implemented by this library is also sometimes referred to as Fisher-Jenks algorithm, Jenks Optimisation Method or Fisher exact optimization method. This is a deterministic method to calculate the optimal class boundaries.
Intended compatibility: CPython 3.7+
Wheels are provided via PyPI for Windows / MacOS / Linux users - Also available on conda-forge channel for Anaconda users.
Usage
Two ways of using jenkspy
are available:
- by using the
jenks_breaks
function which takes as input
a list
/ tuple
/ array.array
/ numpy.ndarray
of integers or floats and returns a list of values that correspond to the limits of the classes (starting with the minimum value of the series - the lower bound of the first class - and ending with its maximum value - the upper bound of the last class).
>>> import jenkspy
>>> import json
>>> with open('tests/test.json', 'r') as f:
...
... data = json.loads(f.read())
...
>>> jenkspy.jenks_breaks(data, n_classes=5)
[0.0028109620325267315, 2.0935479691252112, 4.205495140049607, 6.178148351609707, 8.09175917180255, 9.997982932254672]
- by using the
JenksNaturalBreaks
class that is inspired by scikit-learn
classes.
The .fit
and .group
behavior is slightly different from jenks_breaks
,
by accepting value outside the range of the minimum and maximum value of breaks_
,
retaining the input size. It means that fit and group will use only the inner_breaks_
.
All value below the min bound will be included in the first group and all value higher than the max bound will be included in the last group.
>>> from jenkspy import JenksNaturalBreaks
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> jnb = JenksNaturalBreaks(4)
>>> jnb.fit(x)
>>> print(jnb.labels_)
... print(jnb.groups_)
... print(jnb.breaks_)
... print(jnb.inner_breaks_)
[0 0 0 1 1 1 2 2 2 3 3 3]
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10, 11])]
[0.0, 2.0, 5.0, 8.0, 11.0]
[2.0, 5.0, 8.0]
>>> print(jnb.predict(15))
3
>>> print(jnb.predict([2.5, 3.5, 6.5]))
[1 1 2]
>>> print(jnb.group([2.5, 3.5, 6.5]))
[array([], dtype=float64), array([2.5, 3.5]), array([6.5]), array([], dtype=float64)]
Installation
pip install jenkspy
git clone http://github.com/mthh/jenkspy
cd jenkspy/
pip install .
conda install -c conda-forge jenkspy
Requirements
Motivation:
- Making a painless installing C extension so it could be used more easily
as a dependency in an other package (and so learning how to build wheels
using appveyor / travis at first - now it uses GitHub Actions).
- Getting the break values! (and fast!). No fancy functionality provided,
but contributions/forks/etc are welcome.
- Other python implementations are currently existing but not as fast or not available on PyPi.