You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

ksfeatureselector

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

ksfeatureselector

A robust and flexible Python package designed for selecting the most discriminatory features in both **binary and multi-class classification problems** using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.

0.2.0

PyPI

Maintainers: 1

KSFeatureSelector

KSFeatureSelector is a robust and flexible Python package designed for selecting the most discriminatory features in both binary and multi-class classification problems using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.

Features

Uses the K-S test to rank features by their ability to separate classes.
Handles target variables with more than two categories (up to 10 classes internally).
Flexible Comparison Strategies:
- pairwise: Performs K-S tests between every unique pair of classes.
- one-vs-rest: Compares each class against all other classes combined.
Multiple P-Value Aggregation Methods:
- fisher: Uses Fisher's combined probability test (default, generally recommended).
- min: Takes the minimum p-value from all comparisons for a feature.
- max: Takes the maximum p-value from all comparisons for a feature.
Scikit-learn Style API: Offers a class-based interface (KSFeatureSelector with fit, transform) for seamless integration into machine learning pipelines.
Convenience Function: Provides a simple select_ks_features wrapper for quick, one-off feature selection.
Robust Validation & Warnings: Includes comprehensive input validation and issues UserWarning for data quality issues, such as categories with too few observations or insufficient samples for K-S tests.
Pure Python: Built using pandas, scipy, and numpy.

Installation

.. code-block:: bash

pip install ksfeatureselector

For local installation:

.. code-block:: bash

pip install -e .

Usage

.. code-block:: python

from ksfeatureselector import select_ks_features

significant_features = select_ks_features( df, x_cols, y_var, top_p=0.01, aggregation_method='one-vs-rest', p_value_aggregation_method='min' ) print(f"Significant features (one-vs-rest, min p-value <= 0.01): {significant_features}")

Example 3: Select top 3 features using 'pairwise' comparison

and 'max' p-value aggregation

top_3_features_max_agg = select_ks_features( df, x_cols, y_var, top_n=3, aggregation_method='pairwise', p_value_aggregation_method='max' ) print(f"Top 3 features (pairwise, max p-value): {top_3_features_max_agg}")

Arguments

df (pd.DataFrame):
The input DataFrame containing feature columns and the binary target column.
x_cols (List[str]):
A list of column names in df representing the features you want to evaluate.
y_var (str):
The name of the column in df representing the binary target variable (0/1 or similar).
top_p (float, optional):
If provided, only features with a K-S test p-value less than top_p will be selected.
top_n (int, optional):
If provided, the top n features with the lowest p-values will be selected.

.. note::

You can use either top_p or top_n, or both. If both are given, the function will apply top_p first, and then take the top n from that filtered list.

License

MIT License

Author

V Subrahmanya Raghu Ram Kishore Parupudi Email: pvsrrkishore@gmail.com

FAQs

What is ksfeatureselector?

Is ksfeatureselector well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ksfeatureselector

KSFeatureSelector

Features

Installation

Usage

Example 3: Select top 3 features using 'pairwise' comparison

and 'max' p-value aggregation

Arguments

License

Author

Related posts

npm Phishing Email Targets Developers with Typosquatted Domain

Knip Hits 500 Releases with v5.62.0, Improving TypeScript Config Detection and Plugin Integrations