🚀 DAY 5 OF LAUNCH WEEK: Introducing Socket Firewall Enterprise.Learn more →

Book a Demo Install Sign in

metacluster

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

metacluster

MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems

PyPI

Version: 1.3.0

Maintainers: 1

MetaCluster

PyPI - Python Version PyPI - Status GitHub Release Date GitHub contributors

MetaCluster is the largest open-source nature-inspired optimization (Metaheuristic Algorithms) library for clustering problem in Python

Free software: GNU General Public License (GPL) V3 license
Provided 3 classes: MetaCluster, MhaKCentersClustering, and MhaKMeansTuner
Total nature-inspired metaheuristic optimizers (Metaheuristic Algorithms): > 200 optimizers
Total objective functions (as fitness): > 40 objectives
Total supported datasets: 48 datasets from Scikit learn, UCI, ELKI, KEEL...
Total performance metrics: > 40 metrics
Total different way of detecting the K value: >= 10 methods
Documentation: https://metacluster.readthedocs.io/en/latest/
Python versions: >= 3.7.x
Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido

Citation Request

Please include these citations if you plan to use this library:

@article{VanThieu2023,
  author = {Van Thieu,  Nguyen and Oliva,  Diego and Pérez-Cisneros,  Marco},
  title = {MetaCluster: An open-source Python library for metaheuristic-based clustering problems},
  journal = {SoftwareX},
  year = {2023},
  pages = {101597},
  volume = {24},
  DOI = {10.1016/j.softx.2023.101597},
}

@article{van2023mealpy,
  title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
  author={Van Thieu, Nguyen and Mirjalili, Seyedali},
  journal={Journal of Systems Architecture},
  year={2023},
  publisher={Elsevier},
  doi={10.1016/j.sysarc.2023.102871}
}

Installation

Install the current PyPI release:

$ pip install metacluster

After installation, check the version:

$ python
>>> import metacluster
>>> metacluster.__version__

Examples

We implement a dedicated Github repository for examples at MetaCluster_examples

Let's go through some basic examples from here:

1. First, load dataset. You can use the available datasets from MetaCluster:

# Load available dataset from MetaCluster
from metacluster import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1      -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia")

Or you can load your own dataset

import pandas as pd
from metacluster import Data

# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset")

2. Next, scale your features

You should confirm that your dataset is scaled and normalized

# MinMaxScaler 
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))

# StandardScaler 
data.X, scaler = data.scale(data.X, method="StandardScaler")

# MaxAbsScaler 
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")

# RobustScaler 
data.X, scaler = data.scale(data.X, method="RobustScaler")

# Normalizer 
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2")   # "l1" or "l2" or "max"

3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics

list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
    {"name": "FBIO", "epoch": 10, "pop_size": 30},
    {"name": "GWO", "epoch": 10, "pop_size": 30},
    {"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]

You can check all supported metaheuristic algorithms from: https://github.com/thieu1995/mealpy. All supported clustering objectives and metrics from: https://github.com/thieu1995/permetrics.

If you don't want to read the documents, you can print out all supported information by:

from metacluster import MetaCluster 

# Get all supported methods and print them out
MetaCluster.get_support(name="all")

4. Next, create an instance of MetaCluster class and run it.

model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3, seed=10)

model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)

model.save_boxplots()
model.save_convergences()

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset. More examples can be found here

Support

Official links (questions, problems)

Official source code repo: https://github.com/thieu1995/metacluster
Official document: https://metacluster.readthedocs.io/
Download releases: https://pypi.org/project/metacluster/
Issue tracker: https://github.com/thieu1995/metacluster/issues
Notable changes log: https://github.com/thieu1995/metacluster/blob/master/ChangeLog.md
Official chat group: https://t.me/+fRVCJGuGJg1mNDg1
This project also related to our another projects which are optimization and machine learning. Check it here:

Supported links

1. https://jtemporal.com/kmeans-and-elbow-method/
2. https://medium.com/@masarudheena/4-best-ways-to-find-optimal-number-of-clusters-for-clustering-with-python-code-706199fa957c
3. https://github.com/minddrummer/gap/blob/master/gap/gap.py
4. https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101
5. https://doi.org/10.1016/j.engappai.2018.03.013
6. https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Clustering_metrics.ipynb
7. https://elki-project.github.io/
8. https://sci2s.ugr.es/keel/index.php
9. https://archive.ics.uci.edu/datasets
10. https://python-charts.com/distribution/box-plot-plotly/
11. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation

Keywords

FAQs

What is metacluster?

Is metacluster well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

metacluster

Citation Request

Installation

Examples

1. First, load dataset. You can use the available datasets from MetaCluster:

2. Next, scale your features

3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics

4. Next, create an instance of MetaCluster class and run it.

Support

Official links (questions, problems)

Supported links

Keywords

Related posts

Security Community Slams MIT-linked Report Claiming AI Powers 80% of Ransomware

Ruby Core Team Assumes Stewardship of RubyGems and Bundler, Former Maintainers Offer to Transfer All Rights to Matz