
Security News
npm ‘is’ Package Hijacked in Expanding Supply Chain Attack
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.
Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python
MAFESE (Metaheuristic Algorithms for FEature SElection) is the largest open-source Python library dedicated to the feature selection (FS) problem using metaheuristic algorithms. It contains filter, wrapper, embedded, and unsupervised-based methods with modern optimization techniques. Whether you're tackling classification or regression tasks, MAFESE helps automate and enhance feature selection to improve model performance.
numpy
, scipy
, scikit-learn
, pandas
, mealpy
, permetrics
, plotly
, kaleido
MAFESE provides all state-of-the-art feature selection (FS) methods:
🧠 Unsupervised-based FS
🔎 Filter-based FS
🌲 Embedded-based FS
⚙️ Wrapper-based FS
Please include these citations if you plan to use this incredible library:
@article{van2024feature,
title={Feature selection using metaheuristics made easy: Open source MAFESE library in Python},
author={Van Thieu, Nguyen and Nguyen, Ngoc Hung and Heidari, Ali Asghar},
journal={Future Generation Computer Systems},
year={2024},
publisher={Elsevier},
doi={10.1016/j.future.2024.06.006},
url={https://doi.org/10.1016/j.future.2024.06.006},
}
@article{van2023mealpy,
title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
author={Van Thieu, Nguyen and Mirjalili, Seyedali},
journal={Journal of Systems Architecture},
year={2023},
publisher={Elsevier},
doi={10.1016/j.sysarc.2023.102871}
}
Install the latest release from PyPI:
$ pip install mafese
After installation, check the version:
$ python
>>> import mafese
>>> mafese.__version__
Use a built-in dataset:
from mafese import get_dataset
data = get_dataset("Arrhythmia")
Or load your own:
import pandas as pd
from mafese import Data
df = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = df[:, :-1], df[:, -1]
data = Data(X, y)
data.split_train_test(test_size=0.2)
print(data.X_train[:2].shape)
print(data.y_train[:2].shape)
data.X_train, scaler_X = data.scale(data.X_train, scaling_methods=("standard", "minmax"))
data.X_test = scaler_X.transform(data.X_test)
data.y_train, scaler_y = data.encode_label(data.y_train) # Classification only
data.y_test = scaler_y.transform(data.y_test)
## First way, we recommended
from mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector
from mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector
## Second way
from mafese.unsupervised import UnsupervisedSelector
from mafese.filter import FilterSelector
from mafese.embedded.lasso import LassoSelector
from mafese.embedded.tree import TreeSelector
from mafese.wrapper.sequential import SequentialSelector
from mafese.wrapper.recursive import RecursiveSelector
from mafese.wrapper.mha import MhaSelector, MultiMhaSelector
feat_selector = UnsupervisedSelector(problem='classification', method='DR', n_features=5)
feat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)
feat_selector = LassoSelector(problem="classification", estimator="lasso", estimator_paras={"alpha": 0.1})
feat_selector = TreeSelector(problem="classification", estimator="tree")
feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=3, direction="forward")
feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5)
feat_selector = MhaSelector(problem="classification",obj_name="AS",
estimator="knn", estimator_paras=None,
optimizer="BaseGA", optimizer_paras=None,
mode='single', n_workers=None, termination=None, seed=None, verbose=True)
feat_selector = MultiMhaSelector(problem="classification", obj_name="AS",
estimator="knn", estimator_paras=None,
list_optimizers=("OriginalWOA", "OriginalGWO", "OriginalTLO", "OriginalGSKA"),
list_optimizer_paras=[{"epoch": 10, "pop_size": 30}, ]*4,
mode='single', n_workers=None, termination=None, seed=None, verbose=True)
feat_selector.fit(data.X_train, data.y_train)
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)
# check the index of selected features
print(feat_selector.selected_feature_indexes)
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
If you use our method, don't transform the data.
feat_selector.evaluate(estimator="svm", data=data, metrics=["AS", "PS", "RS"])
## Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look
like this:
{'AS_train': 0.77176, 'PS_train': 0.54177, 'RS_train': 0.6205, 'AS_test': 0.72636, 'PS_test': 0.34628, 'RS_test': 0.52747}
X_test, y_test = data.X_test, data.y_test
feat_selector.evaluate(estimator=None, data=data, metrics=["AS", "PS", "RS"])
For more usage examples please look at examples folder.
You can find it here: https://github.com/thieu1995/permetrics or use this
from mafese import MhaSelector
print(MhaSelector.SUPPORTED_REGRESSION_METRICS)
print(MhaSelector.SUPPORTED_CLASSIFICATION_METRICS)
print(feat_selector.SUPPORT)
Or you better read the document from: https://mafese.readthedocs.io/en/latest/
raise ValueError("Existed at least one new label in y_pred.")
ValueError: Existed at least one new label in y_pred.
This occurs only when you are working on a classification problem with a small dataset that has many classes. For instance, the "Zoo" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the new label. There are several solutions to this problem.
from imblearn.over_sampling import SMOTE
import pandas as pd
from mafese import Data
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
X_new, y_new = SMOTE().fit_resample(X, y)
data = Data(X_new, y_new)
import pandas as pd
from mafese import Data
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
data.split_train_test(test_size=0.2, random_state=10) # Try different random_state value
Developed by: Thieu @ 2023
FAQs
Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python
We found that mafese demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The ongoing npm phishing campaign escalates as attackers hijack the popular 'is' package, embedding malware in multiple versions.
Security News
A critical flaw in the popular npm form-data package could allow HTTP parameter pollution, affecting millions of projects until patched versions are adopted.
Security News
Bun 1.2.19 introduces isolated installs for smoother monorepo workflows, along with performance boosts, new tooling, and key compatibility fixes.