
Product
Introducing Rust Support in Socket
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.
Heuristic for quick feature selection for tabular regression/classification using shapley values
shap-select
implements a heuristic for fast feature selection, for tabular regression and classification models.
The basic idea is running a linear or logistic regression of the target on the Shapley values of the original features, on the validation set, discarding the features with negative coefficients, and ranking/filtering the rest according to their statistical significance. For motivation and details, refer to our research paper see the example notebook
Earlier packages using Shapley values for feature selection exist, the advantages of this one are
from shap_select import shap_select
# Here model is any model supported by the shap library, fitted on a different (train) dataset
# Task can be regression, binary, or multiclass
selected_features_df = shap_select(model, X_val, y_val, task="multiclass", threshold=0.05)
feature name | t-value | stat.significance | coefficient | selected | |
---|---|---|---|---|---|
0 | x5 | 20.211299 | 0.000000 | 1.052030 | 1 |
1 | x4 | 18.315144 | 0.000000 | 0.952416 | 1 |
2 | x3 | 6.835690 | 0.000000 | 1.098154 | 1 |
3 | x2 | 6.457140 | 0.000000 | 1.044842 | 1 |
4 | x1 | 5.530556 | 0.000000 | 0.917242 | 1 |
5 | x6 | 2.390868 | 0.016827 | 1.497983 | 1 |
6 | x7 | 0.901098 | 0.367558 | 2.865508 | 0 |
7 | x8 | 0.563214 | 0.573302 | 1.933632 | 0 |
8 | x9 | -1.607814 | 0.107908 | -4.537098 | -1 |
If you use shap-select
in your research, please cite our paper:
@misc{kraev2024shapselectlightweightfeatureselection,
title={Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression},
author={Egor Kraev and Baran Koseoglu and Luca Traverso and Mohammed Topiwalla},
year={2024},
eprint={2410.06815},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.06815},
}
FAQs
Heuristic for quick feature selection for tabular regression/classification using shapley values
We found that shap-select demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.
Product
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.