Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

classixclustering

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

classixclustering

Fast and explainable clustering based on sorting

1.2.7
PyPI

Maintainers: 1

.. image:: https://codecov.io/gh/nla-group/classix/branch/master/graph/badge.svg?token=D4MQZS67H1 :target: https://codecov.io/gh/nla-group/classix :alt: codecov .. image:: https://img.shields.io/pypi/v/ClassixClustering?color=orange :target: https://pypi.org/project/ClassixClustering/ :alt: pypi .. image:: https://static.pepy.tech/badge/ClassixClustering :target: https://pypi.org/project/ClassixClustering/ :alt: Download Status .. image:: https://readthedocs.org/projects/classix/badge/?version=latest :target: https://classix.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status .. image:: https://img.shields.io/badge/License-MIT-yellow.svg :target: https://github.com/nla-group/classix/blob/master/LICENSE :alt: License: MIT

CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highlights:

Ability to cluster low and high-dimensional data of arbitrary shape efficiently.
Ability to detect and deal with outliers in the data.
Ability to provide textual explanations for the generated clusters.
Full reproducibility of all tests in the accompanying paper.
Support of Cython compilation.

CLASSIX is a contrived acronym of CLustering by Aggregation with Sorting-based Indexing and the letter X for explainability. CLASSIX clustering consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by a merging phase of groups into clusters. The algorithm is controlled by two parameters, namely the distance parameter radius for the group aggregation and a minPts parameter controlling the minimal cluster size.

Installing and example

CLASSIX has the following dependencies for its clustering functionality:

cython
numpy
scipy
requests

and requires the following packages for data visualization:

matplotlib
pandas

To install the current CLASSIX release via PIP use:

.. code:: bash

pip install classixclustering

To check the CLASSIX installation you can use:

.. code:: bash

python -m pip show classixclustering

Download the repository via:

.. code:: bash

git clone https://github.com/nla-group/classix.git

Example usage:

.. code:: python

from sklearn import datasets
from classix import CLASSIX

# Generate synthetic data
X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1)

# Employ CLASSIX clustering
clx = CLASSIX(sorting='pca', verbose=1)
clx.fit(X)

Citation

.. code:: bibtex

@techreport{CG22b,
  title   = {Fast and explainable clustering based on sorting},
  author  = {Chen, Xinye and G\"{u}ttel, Stefan},
  year    = {2022},
  number  = {arXiv:2202.01456},
  pages   = {25},
  institution = {The University of Manchester},
  address = {UK},
  type    = {arXiv EPrint},
  url     = {https://arxiv.org/abs/2202.01456}
}

FAQs

What is classixclustering?

Is classixclustering well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

classixclustering

Installing and example

Citation

Related posts

Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm

Malicious npm Package Typosquats Popular TypeScript ESLint Plugin, Exfiltrates Data and Enables Remote Exploitation