New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

auto-corr-feature-selection

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

auto-corr-feature-selection

Automatically select the most relevant features based on correlation.

  • 0.1.3
  • PyPI
  • Socket score

Maintainers
1

AutoCorrFeatureSelection

Automatically select the most relevant features based on correlation.

PyPI Latest Release PyPI Downloads

How it works

The AutoCorrFeatureSelection class utilizes correlation analysis to automatically select relevant features from a given dataset. Here's a step-by-step overview of how it works:

  1. Correlation Matrix:

The first step is to calculate the correlation matrix, which measures the pairwise correlation between all features in the dataset. The correlation matrix provides insight into the relationships between the features.

sepal.lengthsepal.widthpetal.lengthpetal.widthvariety
sepal.length1.0-0.110.870.810.72
sepal.width-0.111.0-0.42-0.36-0.42
petal.length0.87-0.421.00.960.94
petal.width0.81-0.360.961.00.95
variety0.72-0.420.940.951.0
  1. Threshold-based Selection:

Next, the class applies a threshold to the correlation matrix to identify columns with correlations above the specified threshold (for example 0.85). These columns are considered highly correlated and may contain redundant or similar information.

sepal.lengthsepal.widthpetal.lengthpetal.widthvariety
sepal.length0.87
sepal.width
petal.length0.870.960.94
petal.width0.960.95
variety0.940.95
  1. Selected Columns and Relationships:

The selected columns are visually represented, showcasing the relationships between the highly correlated features. This diagram helps visualize the interconnectedness of these features.

iris_corr_diagram

By following these steps, the AutoCorrFeatureSelection class automates the process of feature selection based on correlation analysis, enabling you to identify and focus on the most informative and non-redundant features in your dataset.

Example

Examples can be found in examples/.


# set up auto correlation
auto_corr = AutoCorrFeatureSelection(df)

# select low correlated columns
selected_columns = auto_corr.select_columns_above_threshold(threshold=0.85)
filtered_df = df[selected_columns]

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc