Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library
Readme
MAFESE (Metaheuristic Algorithms for FEature SElection) is the biggest python library for feature selection (FS) problem using meta-heuristic algorithms.
$ pip install mafese==0.1.9
$ git clone https://github.com/thieu1995/mafese.git
$ cd mafese
$ python setup.py install
$ pip install git+https://github.com/thieu1995/mafese
After installation, you can import MAFESE as any other Python module:
$ python
>>> import mafese
>>> mafese.__version__
docs
examples
mafese
data/
cls/
aggregation.csv
Arrhythmia.csv
...
reg/
boston-housing.csv
diabetes.csv
...
wrapper/
mha.py
recursive.py
sequential.py
embedded/
lasso.py
tree.py
filter.py
unsupervised.py
utils/
correlation.py
data_loader.py
encoder.py
estimator.py
mealpy_util.py
transfer.py
validator.py
__init__.py
selector.py
README.md
setup.py
Let's go through some examples.
# Load available dataset from MAFESE
from mafese import get_dataset
# Try unknown data
get_dataset("unknown")
# Enter: 1 -> This wil list all of avaialble dataset
data = get_dataset("Arrhythmia")
import pandas as pd
from mafese import Data
# load X and y
# NOTE mafese accepts numpy arrays only, hence the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
data.split_train_test(test_size=0.2, inplace=True)
print(data.X_train[:2].shape)
print(data.y_train[:2].shape)
You should confirm that your dataset is scaled and normalized for some problem or estimator such as Neural Network
## First way, we recommended
from mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector
from mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector
## Second way
from mafese.unsupervised import UnsupervisedSelector
from mafese.filter import FilterSelector
from mafese.embedded.lasso import LassoSelector
from mafese.embedded.tree import TreeSelector
from mafese.wrapper.sequential import SequentialSelector
from mafese.wrapper.recursive import RecursiveSelector
from mafese.wrapper.mha import MhaSelector, MultiMhaSelector
feat_selector = UnsupervisedSelector(problem='classification', method='DR', n_features=5)
feat_selector = FilterSelector(problem='classification', method='SPEARMAN', n_features=5)
feat_selector = LassoSelector(problem="classification", estimator="lasso", estimator_paras={"alpha": 0.1})
feat_selector = TreeSelector(problem="classification", estimator="tree")
feat_selector = SequentialSelector(problem="classification", estimator="knn", n_features=3, direction="forward")
feat_selector = RecursiveSelector(problem="classification", estimator="rf", n_features=5)
feat_selector = MhaSelector(problem="classification", estimator="knn",
optimizer="BaseGA", optimizer_paras=None,
transfer_func="vstf_01", obj_name="AS")
list_optimizers = ("OriginalWOA", "OriginalGWO", "OriginalTLO", "OriginalGSKA")
list_paras = [{"epoch": 10, "pop_size": 30}, ]*4
feat_selector = MultiMhaSelector(problem="classification", estimator="knn",
list_optimizers=list_optimizers, list_optimizer_paras=list_paras,
transfer_func="vstf_01", obj_name="AS")
feat_selector.fit(data.X_train, data.y_train)
# check selected features - True (or 1) is selected, False (or 0) is not selected
print(feat_selector.selected_feature_masks)
print(feat_selector.selected_feature_solution)
# check the index of selected features
print(feat_selector.selected_feature_indexes)
X_train_selected = feat_selector.transform(data.X_train)
X_test_selected = feat_selector.transform(data.X_test)
If you use our method, don't transform the data.
i) You can use difference estimator than the one used in feature selection process
feat_selector.evaluate(estimator="svm", data=data, metrics=["AS", "PS", "RS"])
## Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look
like this:
{'AS_train': 0.77176, 'PS_train': 0.54177, 'RS_train': 0.6205, 'AS_test': 0.72636, 'PS_test': 0.34628, 'RS_test': 0.52747}
ii) You can use the same estimator in feature selection process
X_test, y_test = data.X_test, data.y_test
feat_selector.evaluate(estimator=None, data=data, metrics=["AS", "PS", "RS"])
from mafese import MhaSelector
print(MhaSelector.SUPPORTED_REGRESSION_METRICS)
print(MhaSelector.SUPPORTED_CLASSIFICATION_METRICS)
print(feat_selector.SUPPORT)
Or you better read the document from: https://mafese.readthedocs.io/en/latest/
raise ValueError("Existed at least one new label in y_pred.")
ValueError: Existed at least one new label in y_pred.
How to solve this?
This occurs only when you are working on a classification problem with a small dataset that has many classes. For instance, the "Zoo" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the new label. There are several solutions to this problem.
1st: Use the SMOTE method to address imbalanced data and ensure that all classes have the same number of samples.
from imblearn.over_sampling import SMOTE
import pandas as pd
from mafese import Data
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
X_new, y_new = SMOTE().fit_resample(X, y)
data = Data(X_new, y_new)
import pandas as pd
from mafese import Data
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y)
data.split_train_test(test_size=0.2, random_state=10) # Try different random_state value
For more usage examples please look at examples folder.
Official source code repo: https://github.com/thieu1995/mafese
Official document: https://mafese.readthedocs.io/
Download releases: https://pypi.org/project/mafese/
Issue tracker: https://github.com/thieu1995/mafese/issues
Notable changes log: https://github.com/thieu1995/mafese/blob/master/ChangeLog.md
Examples with different mealpy version: https://github.com/thieu1995/mafese/blob/master/examples.md
Official chat group: https://t.me/+fRVCJGuGJg1mNDg1
This project also related to our another projects which are "optimization" and "machine learning", check it here:
Please include these citations if you plan to use this library:
@software{nguyen_van_thieu_2023_7969043,
author = {Nguyen Van Thieu, Ngoc Hung Nguyen, Ali Asghar Heidari},
title = {Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python},
month = may,
year = 2023,
publisher = {Zenodo},
doi = {10.5281/zenodo.7969042},
url = {https://github.com/thieu1995/mafese}
}
@article{van2023mealpy,
title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python},
author={Van Thieu, Nguyen and Mirjalili, Seyedali},
journal={Journal of Systems Architecture},
year={2023},
publisher={Elsevier},
doi={10.1016/j.sysarc.2023.102871}
}
FAQs
MAFESE: Metaheuristic Algorithm for Feature Selection - An Open Source Python Library
We found that mafese demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.