You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

shap-select

Package Overview
Dependencies
Maintainers
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

shap-select

Heuristic for quick feature selection for tabular regression/classification using shapley values

0.1.2
pipPyPI
Maintainers
3

Overview

shap-select implements a heuristic for fast feature selection, for tabular regression and classification models.

The basic idea is running a linear or logistic regression of the target on the Shapley values of the original features, on the validation set, discarding the features with negative coefficients, and ranking/filtering the rest according to their statistical significance. For motivation and details, refer to our research paper see the example notebook

Earlier packages using Shapley values for feature selection exist, the advantages of this one are

  • Regression on the validation set to combat overfitting
  • Only a single fit of the original model needed
  • A single intuitive hyperparameter for feature selection: statistical significance
  • Bonferroni correction for multiclass classification
  • Address collinearity of (Shapley value) features by repeated (linear/logistic) regression

Usage

from shap_select import shap_select
# Here model is any model supported by the shap library, fitted on a different (train) dataset
# Task can be regression, binary, or multiclass
selected_features_df = shap_select(model, X_val, y_val, task="multiclass", threshold=0.05)
 feature namet-valuestat.significancecoefficientselected
0x520.2112990.0000001.0520301
1x418.3151440.0000000.9524161
2x36.8356900.0000001.0981541
3x26.4571400.0000001.0448421
4x15.5305560.0000000.9172421
5x62.3908680.0168271.4979831
6x70.9010980.3675582.8655080
7x80.5632140.5733021.9336320
8x9-1.6078140.107908-4.537098-1

Citation

If you use shap-select in your research, please cite our paper:

@misc{kraev2024shapselectlightweightfeatureselection,
      title={Shap-Select: Lightweight Feature Selection Using SHAP Values and Regression}, 
      author={Egor Kraev and Baran Koseoglu and Luca Traverso and Mohammed Topiwalla},
      year={2024},
      eprint={2410.06815},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.06815}, 
}

Keywords

shap-select

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts