
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
A robust and flexible Python package designed for selecting the most discriminatory features in both **binary and multi-class classification problems** using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.
KSFeatureSelector
is a robust and flexible Python package designed for selecting the most discriminatory features in both binary and multi-class classification problems using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.
pairwise
: Performs K-S tests between every unique pair of classes.one-vs-rest
: Compares each class against all other classes combined.fisher
: Uses Fisher's combined probability test (default, generally recommended).min
: Takes the minimum p-value from all comparisons for a feature.max
: Takes the maximum p-value from all comparisons for a feature.KSFeatureSelector
with fit
, transform
) for seamless integration into machine learning pipelines.select_ks_features
wrapper for quick, one-off feature selection.UserWarning
for data quality issues, such as categories with too few observations or insufficient samples for K-S tests.pandas
, scipy
, and numpy
... code-block:: bash
pip install ksfeatureselector
For local installation:
.. code-block:: bash
pip install -e .
.. code-block:: python
from ksfeatureselector import select_ks_features
significant_features = select_ks_features( df, x_cols, y_var, top_p=0.01, aggregation_method='one-vs-rest', p_value_aggregation_method='min' ) print(f"Significant features (one-vs-rest, min p-value <= 0.01): {significant_features}")
top_3_features_max_agg = select_ks_features( df, x_cols, y_var, top_n=3, aggregation_method='pairwise', p_value_aggregation_method='max' ) print(f"Top 3 features (pairwise, max p-value): {top_3_features_max_agg}")
df (pd.DataFrame
):
The input DataFrame containing feature columns and the binary target column.
x_cols (List[str]
):
A list of column names in df
representing the features you want to evaluate.
y_var (str
):
The name of the column in df
representing the binary target variable (0/1 or similar).
top_p (float
, optional):
If provided, only features with a K-S test p-value less than top_p
will be selected.
top_n (int
, optional):
If provided, the top n
features with the lowest p-values will be selected.
.. note::
You can use either top_p
or top_n
, or both. If both are given, the function will apply top_p
first,
and then take the top n
from that filtered list.
MIT License
V Subrahmanya Raghu Ram Kishore Parupudi Email: pvsrrkishore@gmail.com
FAQs
A robust and flexible Python package designed for selecting the most discriminatory features in both **binary and multi-class classification problems** using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.
We found that ksfeatureselector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.