Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

autoprognosis

Package Overview
Dependencies
Maintainers
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

autoprognosis

A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

  • 0.1.21
  • PyPI
  • Socket score

Maintainers
2

AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.

Test In Colab Test In Colab arXiv

Tests Tests R Tutorials Documentation Status

License about slack

image

:key: Features

  • :rocket: Automatically learns ensembles of pipelines for classification, regression or survival analysis tasks.
  • :cyclone: Easy to extend pluginable architecture.
  • :fire: Interpretability and uncertainty quantification tools.
  • :adhesive_bandage: Data imputation using HyperImpute.
  • :zap: Build demonstrators using Streamlit.
  • :notebook: Python and R tutorials available.
  • :book: Read the docs

:rocket: Installation

Using pip

The library can be installed from PyPI using

$ pip install autoprognosis

or from source, using

$ pip install .

AutoPrognosis can use Redis as a backend to improve the performance and quality of the searches.

For that, install the redis-server package following the steps described on the official site.

Environment variables

The library can be configured from a set of environment variables.

VariableDescription
N_OPT_JOBSNumber of cores to use for hyperparameter search. Default : 1
N_LEARNER_JOBSNumber of cores to use by inidividual learners. Default: all cpus
REDIS_HOSTIP address for the Redis database. Default 127.0.0.1
REDIS_PORTRedis port. Default: 6379

Example: export N_OPT_JOBS = 2 to use 2 cores for hyperparam search.

:boom: Sample Usage

Advanced Python tutorials can be found in the Python tutorials section.

R examples can be found in the R tutorials section.

List the available classifiers

from autoprognosis.plugins.prediction.classifiers import Classifiers
print(Classifiers().list_available())

Create a study for classifiers

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
)
model = study.fit()

# Predict the probabilities of each class using the model
model.predict_proba(X)

(Advanced) Customize the study for classifiers

from pathlib import Path

from sklearn.datasets import load_breast_cancer

from autoprognosis.studies.classifiers import ClassifierStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_estimator


X, Y = load_breast_cancer(return_X_y=True, as_frame=True)

df = X.copy()
df["target"] = Y

workspace = Path("workspace")
study_name = "example"

study = ClassifierStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=100,  # how many trials to do for each candidate
    timeout=60,  # seconds
    classifiers=["logistic_regression", "lda", "qda"],
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"
model = load_model_from_file(output)

# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.
metrics = evaluate_estimator(model, X, Y)

print(f"model {model.name()} -> {metrics['str']}")

# Train the model
model.fit(X, Y)

# Predict the probabilities of each class using the model
model.predict_proba(X)

List the available regressors

from autoprognosis.plugins.prediction.regression import Regression
print(Regression().list_available())

Create a Regression study

# third party
import pandas as pd

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy

# Load dataset
df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])

df = X.copy()
df["target"] = y

# Search the model
study_name="regression_example"
study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
)
model = study.fit()

# Predict using the model
model.predict(X)

Advanced Customize the Regression study

# stdlib
from pathlib import Path

# third party
import pandas as pd

# autoprognosis absolute
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_regression
from autoprognosis.studies.regression import RegressionStudy

# Load dataset
df = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat",
    header=None,
    sep="\\t",
)
last_col = df.columns[-1]
y = df[last_col]
X = df.drop(columns=[last_col])

df = X.copy()
df["target"] = y

# Search the model
workspace = Path("workspace")
workspace.mkdir(parents=True, exist_ok=True)

study_name="regression_example"
study = RegressionStudy(
    study_name=study_name,
    dataset=df,  # pandas DataFrame
    target="target",  # the label column in the dataset
    num_iter=10,  # how many trials to do for each candidate. Default: 50
    num_study_iter=2,  # how many outer iterations to do. Default: 5
    timeout=50,  # timeout for optimization for each classfier. Default: 600 seconds
    regressors=["linear_regression", "xgboost_regressor"],
    workspace=workspace,
)

study.run()

# Test the model
output = workspace / study_name / "model.p"

model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.

metrics = evaluate_regression(model, X, y)

print(f"Model {model.name()} score: {metrics['str']}")

# Train the model
model.fit(X, y)


# Predict using the model
model.predict(X)

List available survival analysis estimators

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation
print(RiskEstimation().list_available())

Create a Survival analysis study

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
)

model = study.fit()

# Predict using the model
model.predict(X, eval_time_horizons)

Advanced Customize the Survival analysis study

# stdlib
import os
from pathlib import Path

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

workspace = Path("workspace")
study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
    num_iter=10,
    num_study_iter=1,
    timeout=10,
    risk_estimators=["cox_ph", "survival_xgboost"],
    score_threshold=0.5,
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"

model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.

metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

print(f"Model {model.name()} score: {metrics['str']}")

# Train the model
model.fit(X, T, Y)

# Predict using the model
model.predict(X, eval_time_horizons)

:high_brightness: Tutorials

Plugins

AutoML

Building a demonstrator

:zap: Plugins

Imputation methods

from autoprognosis.plugins.imputers import  Imputers

imputer = Imputers().get(<NAME>)
NameDescription
hyperimputeIterative imputer using both regression and classification methods based on linear models, trees, XGBoost, CatBoost and neural nets
meanReplace the missing values using the mean along each column with SimpleImputer
medianReplace the missing values using the median along each column with SimpleImputer
most_frequentReplace the missing values using the most frequent value along each column with SimpleImputer
missforestIterative imputation method based on Random Forests using IterativeImputer and ExtraTreesRegressor
iceIterative imputation method based on regularized linear regression using IterativeImputer and BayesianRidge
miceMultiple imputations based on ICE using IterativeImputer and BayesianRidge
softimputeLow-rank matrix approximation via nuclear-norm regularization
EMIterative procedure which uses other variables to impute a value (Expectation), then checks whether that is the value most likely (Maximization) - EM imputation algorithm
gainGAIN: Missing Data Imputation using Generative Adversarial Nets

Preprocessing methods

from autoprognosis.plugins.preprocessors import Preprocessors

preprocessor = Preprocessors().get(<NAME>)
NameDescription
maxabs_scalerScale each feature by its maximum absolute value. MaxAbsScaler
scalerStandardize features by removing the mean and scaling to unit variance. - StandardScaler
feature_normalizerNormalize samples individually to unit norm. Normalizer
normal_transformTransform features using quantiles information.QuantileTransformer
uniform_transformTransform features using quantiles information.QuantileTransformer
minmax_scalerTransform features by scaling each feature to a given range.MinMaxScaler

Classification

from autoprognosis.plugins.prediction.classifiers import Classifiers

classifier = Classifiers().get(<NAME>)
NameDescription
neural_netsPyTorch based neural net classifier.
logistic_regressionLogisticRegression
catboostGradient boosting on decision trees - CatBoost
random_forestA random forest classifier. RandomForestClassifier
tabnetTabNet : Attentive Interpretable Tabular Learning
xgboostXGBoostClassifier

Survival Analysis

from autoprognosis.plugins.prediction.risk_estimation import RiskEstimation

predictor = RiskEstimation().get(<NAME>)
NameDescription
survival_xgboostXGBoost Survival Embeddings
loglogistic_aft Log-Logistic AFT model
deephitDeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks
cox_phCox’s proportional hazard model
weibull_aftWeibull AFT model.
lognormal_aftLog-Normal AFT model
coxnetCoxNet is a Cox proportional hazards model also referred to as DeepSurv

Regression

from autoprognosis.plugins.prediction.regression import Regression

regressor = Regression().get(<NAME>)
NameDescription
tabnet_regressorTabNet : Attentive Interpretable Tabular Learning
catboost_regressorGradient boosting on decision trees - CatBoost
random_forest_regressorRandomForestRegressor
xgboost_regressorXGBoostClassifier
neural_nets_regressionPyTorch-based neural net regressor.
linear_regressionLinearRegression

Explainers

from autoprognosis.plugins.explainers import Explainers

explainer = Explainers().get(<NAME>)
NameDescription
risk_effect_sizeFeature importance using Cohen's distance between probabilities
limeLime: Explaining the predictions of any machine learning classifier
symbolic_pursuit[Symbolic Pursuit](Learning outside the black-box: at the pursuit of interpretable models)
shap_permutation_samplerSHAP Permutation Sampler
kernel_shapSHAP KernelExplainer
invaseINVASE: Instance-wise Variable Selection

Uncertainty

from autoprognosis.plugins.uncertainty import UncertaintyQuantification
model = UncertaintyQuantification().get(<NAME>)
NameDescription
cohort_explainer
conformal_prediction
jackknife

:hammer: Test

After installing the library, the tests can be executed using pytest

$ pip install .[testing]
$ pytest -vxs -m "not slow"

Citing

If you use this code, please cite the associated paper:

@misc{https://doi.org/10.48550/arxiv.2210.12090,
  doi = {10.48550/ARXIV.2210.12090},
  url = {https://arxiv.org/abs/2210.12090},
  author = {Imrie, Fergus and Cebere, Bogdan and McKinney, Eoin F. and van der Schaar, Mihaela},
  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

References

  1. AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning
  2. Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning
  3. Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc