
Security News
MCP Community Begins Work on Official MCP Metaregistry
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
This Python package provides multiple feature importance scores and automatically suggests a feature selection based on the majority vote of all models.
Currently the following models for feature importance scoring are included:
The current feature importance models support numerical data only. Categorical data will need to be encoded to numerical features beforehand.
All model scores are normalized to unity, i.e., $\sum i^{N{features}} score_i = 1$
This package includes multiple functions for visualisation of the importance scores and automatic feature ranking.
Feature-to-feature correlations are automatically clustered using hierarchical clustering of the Spearman correlation coefficients (for more details see utils.plot_feature_correlation_spearman
).
pip install selectio
or for development in a conda environment:
conda env update --file environment.yaml
conda activate selectio
See file environment.yaml for more details.
There are multiple options to compute feature selection scores
with a settings yaml file (template provided) that includes all processing and plotting functionality, e.g:
from selectio import selectio
# Read in data from file, generate feature importance plots and save results as csv:
selectio.main('settings_featureimportance.yaml')
This will automatically save all scores and selections in csv file and create multiple score plots.
computed directly using the class selectio.Fsel, e.g.
from selectio.selectio import Fsel
# Read in data X (nsample, nfeatures) and y (nsample)
fsel = Fsel(X, y)
# Score features and return results as dataframe:
dfres = fsel.score_models()
This returns a table with all scores and feature selections. See for more details and visualisation of scores "Option 2)" in the example notebook feature_selection.ipynb
.
as standalone script with a settings file:
cd selectio
python selectio.py -s <FILENAME>.yaml
User settings such as input/output paths and all other options are set in the settings file (Default filename: settings_featureimportance.yaml) Alternatively, the settings file can be specified as a command line argument with: '-s', or '--settings' followed by PATH-TO-FILE/FILENAME.yaml (e.g. python selectio.py -s settings/settings_featureimportance.yaml).
For settings file template, see here
The main settings are:
# Input data path:
inpath: ...
# File name with soil data and corresponding covariates:
infname: ...
# Output results path:
outpath: ...
# Name of target for prediction (column name in dataframe):
name_target: ...
# Name or List of features (column names in infname):
# (covariates to be considered )
name_features:
- ...
- ...
The selectio package provides the option to generate simulated data (see selectio.simdata
)
and includes multiple test functions (see selectio.tests
), e.g.
from selectio import tests
tests.test_select()
For more examples and how to create simulated via simdata.py
, see the provided Jupyter notebooks feature_selection.ipynb
.
More models for feature scoring can be added in the folder 'models' following the existing scripts as example, which includes at least:
__name__
and __fullname__
attribute__init_file__.py
file in the folder modelsOther models for feature selections have been considered, such as PCA or SVD-based methods or univariate screening methods (t-test, correlation, etc.). However, some of these models consider either only linear relationships, or do not take into account the potential multivariate nature of the data structure (e.g., higher order interaction between variables). Note that not all included models are completely generalizable, such as Bayesian regression and Spearman ranking given their dependence on monotonic functional behavior.
Since most models have some limitations or rely on certain data assumptions, it is important to consider a variety of techniques for feature selection and to apply model cross-validations.
LGPL-3.0 License
Copyright (c) 2022 Sebastian Haan
FAQs
Multi-model Feature Importance Scoring and Auto Feature Selection
We found that selectio demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The MCP community is launching an official registry to standardize AI tool discovery and let agents dynamically find and install MCP servers.
Research
Security News
Socket uncovers an npm Trojan stealing crypto wallets and BullX credentials via obfuscated code and Telegram exfiltration.
Research
Security News
Malicious npm packages posing as developer tools target macOS Cursor IDE users, stealing credentials and modifying files to gain persistent backdoor access.