Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
sagemaker-scikit-learn-extension
Advanced tools
Open source library extension of scikit-learn for Amazon SageMaker.
.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg :target: https://opensource.org/licenses/Apache-2.0 :alt: License
.. image:: https://img.shields.io/pypi/v/sagemaker-scikit-learn-extension.svg :target: https://pypi.python.org/pypi/sagemaker-scikit-learn-extension :alt: Latest Version
.. image:: https://img.shields.io/badge/code_style-black-000000.svg :target: https://github.com/python/black :alt: Code style: black
SageMaker Scikit-Learn Extension is a Python module for machine learning built on top of scikit-learn <https://scikit-learn.org>
_.
This project contains standalone scikit-learn estimators and additional tools to support SageMaker Autopilot. Many of the additional estimators are based on existing scikit-learn estimators.
To install,
::
# install from pip
pip install sagemaker-scikit-learn-extension
In order to use the I/O functionalies in the :code:sagemaker_sklearn_extension.externals
module, you will also need to install the :code:mlio
version 0.7 package via conda. The :code:mlio
package is only available through conda at the moment.
To install :code:mlio
,
::
# install mlio
conda install -c mlio -c conda-forge mlio-py==0.7
To see more information about mlio, see https://github.com/awslabs/ml-io.
You can also install from source by cloning this repository and running a pip install
command in the root directory of the repository:
::
# install from source
git clone https://github.com/aws/sagemaker-scikit-learn-extension.git
cd sagemaker-scikit-learn-extension
pip install -e .
SageMaker scikit-learn extension supports Unix/Linux and Mac.
SageMaker scikit-learn extension is tested on:
This library is licensed under the Apache 2.0 License.
We welcome contributions from developers of all experience levels.
The SageMaker scikit-learn extension is meant to be a repository for scikit-learn estimators that don't meet scikit-learn's stringent inclusion criteria.
We recommend using conda for development and testing.
To download conda, go to the conda installation guide <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>
_.
SageMaker scikit-learn extension contains an extensive suite of unit tests.
You can install the libraries needed to run the tests by running :code:pip install --upgrade .[test]
or, for Zsh users: :code:pip install --upgrade .\[test\]
For unit tests, tox will use pytest to run the unit tests in a Python 3.7 interpreter. tox will also run flake8 and pylint for style checks.
conda is needed because of the dependency on mlio 0.7.
To run the tests with tox, run:
::
tox
To use sagemaker-scikit-learn-extension on SageMaker, you can build the sagemaker-scikit-learn-extension-container <https://github.com/aws/sagemaker-scikit-learn-container>
_.
sagemaker_sklearn_extension.decomposition
RobustPCA
dimension reduction for dense and sparse inputssagemaker_sklearn_extension.externals
AutoMLTransformer
utility class encapsulating feature and target transformation functionality used in SageMaker AutopilotHeader
utility class to manage the header and target columns in tabular dataread_csv_data
reads comma separated data and returns a numpy array (uses mlio)sagemaker_sklearn_extension.feature_extraction.date_time
DateTimeVectorizer
convert datetime objects or strings into numeric featuressagemaker_sklearn_extension.feature_extraction.sequences
TSFlattener
convert strings of sequences into numeric featuresTSFreshFeatureExtractor
compute row-wise time series features from a numpy array (uses tsfresh)sagemaker_sklearn_extension.feature_extraction.text
MultiColumnTfidfVectorizer
convert collections of raw documents to a matrix of TF-IDF featuressagemaker_sklearn_extension.impute
RobustImputer
imputer for missing values with customizable mask_function and multi-column constant imputationRobustMissingIndicator
binary indicator for missing values with customizable mask_functionsagemaker_sklearn_extension.preprocessing
BaseExtremeValuesTransformer
customizable transformer for columns that contain "extreme" values (columns that are heavy tailed)LogExtremeValuesTransformer
stateful log transformer for columns that contain "extreme" values (columns that are heavy tailed)NALabelEncoder
encoder for transforming labels to NA valuesQuadraticFeatures
generate and add quadratic features to feature matrixQuantileExtremeValuesTransformer
stateful quantiles transformer for columns that contain "extreme" values (columns that are heThresholdOneHotEncoder
encode categorical integer features as a one-hot numeric array, with optional restrictions on feature encodingRemoveConstantColumnsTransformer
removes constant columnsRobustLabelEncoder
encode labels for seen and unseen labelsRobustStandardScaler
standardization for dense and sparse inputsWOEEncoder
weight of evidence supervised encoderSimilarityEncoder
encode categorical values based on their descriptive stringFAQs
Open source library extension of scikit-learn for Amazon SageMaker.
We found that sagemaker-scikit-learn-extension demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.