Diego
Diego: Data in, IntElliGence Out.
简体中文
A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (Study
) and generate correlated trials (Trial
). Then run the code and get a machine learning model. Implemented using Scikit-learn API glossary, using Bayesian optimization and genetic algorithms for automated machine learning.
Inspired by Fast.ai and MicroSoft nni.

Installation
You need to install swig first, and some rely on C/C++ interface compilation. Recommended to use conda installation
conda install --yes pip gcc swig libgcc=5.2.0
pip install diego
After installation, start with 6 lines of code to solve a machine learning classification problem.
Usage
Each task is considered to be a Study
, and each Study consists of multiple Trial
.
It is recommended to create a Study first and then generate a Trial from the Study:
from diego.study import create_study
import sklearn.datasets
digits = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(digits.data, digits.target,train_size=0.75, test_size=0.25)
s = create_study(X_train, y_train)
s.optimize(X_test, y_test)
RoadMap
ideas for releases in the future
Project Structure
study, trials
Study:
Trial:
如果在OS X或者Linux多进程被 hang/crash/freeze
Since n_jobs>1 may get stuck during parallelization. Similar problems may occur in [scikit-learn] (https://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n -jobs-1-under-osx-or-linux)
In Python 3.4+, one solution is to directly configure multiprocessing
to use forkserver
or spawn
to start process pool management (instead of the default fork
). For example, the forkserver
mode is enabled globally directly in the code.
import multiprocessing
if __name__ == '__main__':
multiprocessing.set_start_method('forkserver')
more info :multiprocessing document
core
storage
For each study, the data storage and parameters, and the model is additionally stored in the Storage
object, which ensures that Study only controls trials, and each Trial updates the results in the storage after updating, and updates the best results.
update result
When creating Study
, you need to specify the direction of optimization maximize
or minimize
. Also specify the metrics for optimization when creating Trials
. The default is maximize accuracy
.
auto ml 补完计划
overview
bayes opt
- fmfn/bayes
- auto-sklearn
grid search
- H2O.ai
tree parzen
- hyperopt
- mlbox
metaheuristics grid search
- pybrain
generation
1.tpot
dl
- ms nni
issues
updates
TODO 文档更新。