Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Edamame is inspired by packages such as pandas-profiling, pycaret, and yellowbrick. The goal of Edamame is to provide user-friendly functions for conducting exploratory data analysis (EDA) on datasets, as well as for training and analyzing batteries of models for regression or classification problems.
To install the package,
pip install edamame
the edamame package works correctly inside a jupyter-notebook. You can find the documentation of the package on the edamame-documentation page.
The package consists of three modules: eda, which performs exploratory data analysis; and regressor and classifier, which handle the training of machine learning models for regression and classification, respectively. To see examples of the uses of the edamame package, you can check out the examples folder in the repository.
import edamame.eda as eda
The eda module provides a wide range of functions for performing exploratory data analysis (EDA) on datasets. With this module you can easily explore and manipulate your data, conduct descriptive statistics, correlation analysis, and prepare your data for machine learning. The "eda" module offers the following functionalities:
Data Exploration and Manipulation functions:
Descriptive Statistics functions:
Correlation Analysis functions:
Useful functions:
from edamame.regressor import TrainRegressor, regression_metrics
The TrainRegressor class is designed to be used as a pipeline for training and handling regression models.
The class provides several methods for fitting different regression models, computing model metrics, saving and loading models, and using AutoML to select the best model based on performance metrics. These methods include:
After saving a model with the save_model method, we can upload the model using the load_model function of the eda module and evaluate its performance on new data using the regression_metrics function.
from edamame.regressor import RegressorDiagnose
The RegressorDiagnose class is designed to diagnose regression models and analyze their performance. The class provides several methods for diagnosing and analyzing the performance of regression models. These methods include:
from sklearn.datasets import make_regression
from edamame.regressor import TrainRegressor
import pandas as pd
import edamame.eda as eda
from edamame.regressor import RegressorDiagnose
X, y = make_regression(n_samples=1000, n_features=5, n_targets=1, random_state=42)
X = pd.DataFrame(X, columns=["f1", "f2", "f3", "f4", "f5"])
y = pd.DataFrame(y, columns=["y"])
X_train, y_train, X_test, y_test = eda.setup(X, y)
X_train_s = eda.scaling(X_train)
X_test_s = eda.scaling(X_test)
regressor = TrainRegressor(X_train_s, y_train, X_test_s, y_test)
rf = regressor.random_forest()
regressor.model_metrics()
diagnose = RegressorDiagnose(X_train_s, y_train, X_test_s, y_test)
diagnose.random_forest_fi(model=rf)
diagnose.prediction_error(model=rf)
from edamame.classifier import TrainClassifier
The TrainClassifier class is designed to be used as a pipeline for training and handling clasification models.
The class provides several methods for fitting different regression models, computing model metrics, saving and loading models, and using AutoML to select the best model based on performance metrics. These methods include:
After saving a model with the save_model method, we can upload the model using the load_model function of the eda module and evaluate its performance on new data using the classifier_metrics function.
from edamame.classifier import classifier_metrics
from edamame.classifier import TrainClassifier
from sklearn import datasets
import edamame.eda as eda
iris = datasets.load_iris()
X = iris.data
X = pd.DataFrame(X, columns=iris.feature_names)
y = iris.target
y = pd.DataFrame(y, columns=['y'])
X_train, y_train, X_test, y_test = eda.setup(X,y)
X_train_s = eda.scaling(X_train)
X_test_s = eda.scaling(X_test)
classifier = TrainClassifier(X_train_s, y_train, X_test_s, y_test)
models = classifier.auto_ml()
svm = classifier.svm()
classifier.model_metrics(model_name="svm")
classifier.save_model(model_name="svm")
svm_upload = eda.load_model(path="svm.pkl")
classifier_metrics(svm_upload, X_train_s, y_train)
FAQs
Exploratory data analysis tools
We found that edamame demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.