Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
pip install astrape
Astrape : https://en.wikipedia.org/wiki/Astrape_and_Bronte
Astrape is a package that would help you organize machine learning projects. It is written mostly in PyTorch Lightning(https://pytorchlightning.ai).
This project is motivated by the need to provide packages, of which only "human-language-level codings" are necessary, to whom that is not familiar with machine learning or with programming. Even though there are high-level machine learning frameworks such as PyTorch(and PyTorch-Lightning) or Tensorflow, it is difficult for beginners even to run a simple Perceptron due to the presence of magic commands e.g., codes regarding saving results, hyperparameter tuning, etc. Even if one has a strong background in learning theory, it would be a long journey to conduct an experiment when he/she hasn't acquired basic programming skills.
Astrape eliminated most of the low-level codings of the entire machine learning process so that every process in machine learning experiment can be done by typing nearly-human-level languages.
I am currently writing lecture notes/slides about machine learning with practice sessions using astrape. Most of the materials are originated from machine learning courses held in SK Hynix(May, 2021 ~ Oct, 2021 : 16 days) and Hyundai NGV (Feb, 2022: 10 days), where I joined as a teaching assistant developing such lecture materials and practice sessions.
Updates
astrape.models.models_lightning
(~ Aug, 2022):zap: Astrape :zap:
"Project" and "Experiment" conspire to the soul of astrape. The term "Project" here refers to "all set of possible machine learning experiments for analyzing the given data". Wait, what is an experiment anyway? An experiment here means "a process of train/validation/test phase with certain random state acquired for all random operations such as splitting scheme, initialization scheme, etc.". "Experiment" is a collection of experiments with the same random state.
For stability's sake, you are tempted to (and should) conduct several "Experiments" with different random states to verify that your data analysis is indeed accurate. Astrape organizes such "Experiments" in a way that makes this sanity-checking process succinct and reproducible.
Project is defined as a set of experiments, with different random seed allocated to each experiment. It performs the A to Z of the process of a machine learning experiment.
Project can visualize the data according to its type of domain e.g., image data, points.
Check details in the project tutorial.
You can visualize data using .plot_data()
method. Depending on the domain of the data, such as image data or simple points in Euclidean space, .plot_data()
automatically visulizes and saves the visualized figure. If the data is image data, you should specify the argument domain_type
as "image"
. Else if the data is points in Euclidean space, you should specify the argument domain_type
as "points"
. When the dimensionality of the points data is higher than 3, .plot_data()
plots a 2D figure with 2 principal axes.
You can create experiments with different random states using .create_experiments()
method. Random seeds will be generated as per the amount
of experiments you want to create.
You can set identical models among the created experiments using .set_models()
method. You should pass the type (class) of the model and its hyperparameters.
You can train the model sequentially using fit_experiments()
method.
Event file for the real-time tracking of the experiment via TensorBoard is saved in {path}/{project_name}/FIT.
Astrape uses Rich ProcessBar
You can save the fitted models using .save_stacks()
, which is basically performing .save_stack()
for the created experiments. Read 3-5. Saving Models for details.
You can also save the fitted models using .save_project()
method.
Astrape supports plotting results of the following:
.plot_identical_model_type()
method plots & saves a figure showing performance of a specified model type (e.g., MLP, UNet) generated in each experiment.
AUC among different random states
.plot_identical_model_structure()
method plots & saves a figure showing performance of a specified model structure generated in each experiment.
AUC among different random states
.plot_all_model_structures()
method plots & saves two figures showing performances of all model structrues created in the project. One is a line plot of performances among different random states and the second is a box plot of performances of all model structures.
Upper image shows performances of all model structures and the lower image shows a boxplot of all model structures trained in the project
When using Astrape, we expect you to conduct all experiments inside the experiment.Experiment
class. This class takes a number of parameters, and you can check the details in the tutorial.
Once you declare an experiment, all random operations are governed by the same random seed you defined as a parameter for the experiment. When initialized (with a given random state) and the train/validation/test data are specified, you should now declare models for the task.
Declare a model using .set_model()
method. There are pre-defined models in astrape.models.models_lightning
that are easy to use, but you can also set any pl.LightningModule
s or descendants of sklearn.base.BaseEstimator
as well.
You can also declare sci-kit learn models and their variants(e.g., xgboost) as well using
.set_model()
. Astrape is compatible with sci-kit learn and PyTorch-lightning modules.
PyTorch Lightning uses Trainer
for training, validating, and testing models. You can specify it using the .set_trainer()
method with trainer configurations as parameters. If you don't, default values will be set for the Trainer
. Check the tutorial for details.
You can fit the model using .fit()
method. When you didn't specify a Trainer
in previous step, default settings would be used in the fitting. Else, you can specify Trainer
implicitly by passing the trainer configurations as parameters for .fit()
.
Training and valiation process of a LightningModule-Based models are visualized in real-time using TensorBoard.
Experiment
class has .stack
as an attribute. If .stack_models
is set to True
, fitted models will automatically be saved to .stack
. If .stack_models
is set to False
, it would stop stacking fitted models to the stack. However, it would still save the model that is just fitted i.e., it will have memory of 1 fit. You can toggle .stack_models
using .toggle_stack_models()
method.
Plus, you can check which model in the stack has the best performance using .best_ckpt_in_stack()
.
You can save the current model using .save_ckpt()
method, or you can save the models in the stack using .save_stack()
method. After .save_stack()
, .stack
will be flushed.
An example of a .stack
.
stack
is flushed after .save_stack()
.
Example of a file tree
With .best_ckpt_thus_far()
method, you can check the best model saved (in local) thus far. It searches for the checkpoint with the best performance as per the validation metric you specify in val_metric
argument.
When defining pl.LightningModule
, you can make your model to log metrics during each step/epoch, like the way in the following figure.
you can log metrics using .log()
method in pl.LightningModule
class
The value that should be passed into val_metric
in .best_ckpt_thus_far()
is the name of the metric you logged using .log()
.
Models in astrape.models.models_lightning
logs metrics in the following way:
"train/loss"
, "train/acc"
, "train/auc"
"val/loss"
, "val/acc"
, "val/auc"
"pred/loss"
, "pred/acc"
, "pred/auc"
"test/loss"
, "test/acc"
, "test/auc"
In regression, only
"{phase}/loss"
is logged.
For sci-kit learn models and its variants, The metrics are logged after the entire training.
You can perform (stratified) k-fold cross-validation using .cross_validation()
method. See details in the tutorial.
One of the arguments that is passed to cross_validation()
is parameters
. parameters
should be a dictionary containing the name of each hyperparameter to tune as keys and its configuration as values.
The values corresponding to the hyperparameter is also a dictionary and should follow the below format:
Suppose you want to tune the hyperparameters with the following scheme.
hyperparameter | values | distribution |
---|---|---|
batch_size | in {32,64,128,256} | grid |
lr | in [1e-4,1e-2] | sampled from uniform distribution(in log scale) |
For the batch size, the value corresponding to the key "batch_size"
should be:
>>> {"values": [32,64,128,256]}
and for the learning rate, the value corresponding to the key "lr"
should be:
>>> {"distribution" : "log_scale_uniform", "min" : -4, "max" : -2}
To sum up, this should be passed as a value of the argument parameters
.
>>> parameters = {
"batch_size" : {
"values" : [32,64,128,256]
},
"lr" : {
"distribution" : "log_scale_uniform",
"min" : -4,
"max" : -2
}
}
FAQs
Astrape: A STrategic, Reproducible & Accesible Project and Experiment.
We found that astrape demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.