Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
An open source project from Data to AI Lab at MIT.
A simple, extensible backend for developing auto-tuning systems.
BTB ("Bayesian Tuning and Bandits") is a simple, extensible backend for developing auto-tuning systems such as AutoML systems. It provides an easy-to-use interface for tuning models and selecting between models.
It is currently being used in several AutoML systems:
If you want to quickly discover BTB, simply click the button below and follow the tutorials!
BTB has been developed and tested on Python 3.6, 3.7 and 3.8
Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where BTB is run.
The easiest and recommended way to install BTB is using pip:
pip install baytune
This will pull and install the latest stable release from PyPi.
If you want to install from source or contribute to the project please read the Contributing Guide.
In this short tutorial we will guide you through the necessary steps to get started using BTB
to select
between models and tune
a model to solve a Machine Learning problem.
In particular, in this example we will be using BTBSession
to perform solve the Wine classification problem
by selecting between the DecisionTreeClassifier
and the SGDClassifier
models from
scikit-learn while also searching for their best hyperparameter
configuration.
The first step in order to use the BTBSession
class is to develop a scoring
function.
This is a Python function that, given a model name and a hyperparameter
configuration,
evaluates the performance of the model on your data and returns a score.
from sklearn.datasets import load_wine
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import f1_score, make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
dataset = load_wine()
models = {
'DTC': DecisionTreeClassifier,
'SGDC': SGDClassifier,
}
def scoring_function(model_name, hyperparameter_values):
model_class = models[model_name]
model_instance = model_class(**hyperparameter_values)
scores = cross_val_score(
estimator=model_instance,
X=dataset.data,
y=dataset.target,
scoring=make_scorer(f1_score, average='macro')
)
return scores.mean()
The second step is to define the hyperparameters
that we want to tune
for each model as
Tunables
.
from btb.tuning import Tunable
from btb.tuning import hyperparams as hp
tunables = {
'DTC': Tunable({
'max_depth': hp.IntHyperParam(min=3, max=200),
'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
}),
'SGDC': Tunable({
'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
})
}
Once you have defined a scoring
function and the tunable hyperparameters
specification of your
models, you can start the searching for the best model and hyperparameter
configuration by using
the btb.BTBSession
.
All you need to do is create an instance passing the tunable hyperparameters
scpecification
and the scoring function.
from btb import BTBSession
session = BTBSession(
tunables=tunables,
scorer=scoring_function
)
And then call the run
method indicating how many tunable iterations you want the BTBSession
to
perform:
best_proposal = session.run(20)
The result will be a dictionary indicating the name of the best model that could be found
and the hyperparameter
configuration that was used:
{
'id': '826aedc2eff31635444e8104f0f3da43',
'name': 'DTC',
'config': {
'max_depth': 21,
'min_samples_split': 0.044010284821858835
},
'score': 0.907229308339589
}
We have a comprehensive benchmarking framework
that we use to evaluate the performance of our Tuners
. For every release, we perform benchmarking
against 100's of challenges, comparing tuners against each other in terms of number of wins.
We present the latest leaderboard from latest release below:
tuner | with ties | without ties |
---|---|---|
Ax.optimize | 220 | 32 |
BTB.GCPEiTuner | 139 | 2 |
BTB.GCPTuner | 252 | 90 |
BTB.GPEiTuner | 208 | 16 |
BTB.GPTuner | 213 | 24 |
BTB.UniformTuner | 177 | 1 |
HyperOpt.tpe | 186 | 6 |
SMAC.HB4AC | 180 | 4 |
SMAC.SMAC4HPO_EI | 220 | 31 |
SMAC.SMAC4HPO_LCB | 205 | 16 |
SMAC.SMAC4HPO_PI | 221 | 35 |
tune
hyperparameters
- see our tuning
tutorial here and
documentation here.hyperparameters
we support
see our documentation here.selection
here
and documentation here.For more details about BTB and all its possibilities and features, please check the project documentation site!
Also do not forget to have a look at the notebook tutorials.
If you use BTB, please consider citing the following paper:
@article{smith2019mlbazaar,
author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
journal = {arXiv e-prints},
year = {2019},
eid = {arXiv:1905.08942},
pages = {arxiv:1904.09535},
archivePrefix = {arXiv},
eprint = {1905.08942},
}
FAQs
Bayesian Tuning and Bandits
We found that baytune demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.