DreamML - Self Machine Learning ❤️
The next stage of evalution DS-Template
About the DreamML
DreamML is a machine learning framework aimed at the industrial process.
The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics.
We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.
*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.
Get started
To develop a model, you can use the notebooks located in the notebooks/1. Model Development
and select the one you need depending on the type of your task.
To validate models, you can use the notebooks located in the notebooks/2. Validate Model
To calibration models, you can use the notebooks located in the notebooks/3. Calibration
How to Use
Information on notebooks for development notebooks/1. Model Development
-
First, you need to determine the pipeline configuration
-
You should start building the configuration and preparing the data for modeling
config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
- Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
- For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
- You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
models=models,
other_models=pipeline.other_model_dict,
encoder=transformer.cat_transformer,
ipynb_name=nb_name,
feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
- At the end, we can generate a development report. By default, it will be saved to the
dreamml/results
folder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)
Authors
LICENSE
This project is licensed under the Apache License, Version 2.0. See LICENSE for details.