New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

easierai-trainer-library

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

easierai-trainer-library

This library contains AI code for training purposes.

0.1.72
PyPI

Maintainers: 1

Project hosting the trainer for generic estimation models.

AI EASIER.AI Trainer Library

Regarding the way to accurately estimate next possible values based on the context information, the trainer makes use of the Python-based framework Keras that operates on top of TensorFlow Google's Deep Learning framework.

Training the models

The trainer is defined as a microservice so that it is easy to be run and deployed in any use case and infrastructure. In addition, it is fully configurable and adaptable for scenario.

With the data produced by the sensors every day, a database is collected for every entity/id so that a predictive model can be produced for each entity. This model will recognize and detect the patterns in the data so that it is able to forecast (predict), considering some context input data, the next value(s) of the time series; or to estimate the value of a specific target feature.

In order to train the model, an Elasticsearch database is used to obtain the data. Models are stored in MINIO so that they can be loaded from other microservices.

Configuration

There are several parameters that can be modified to change how the system learns. These parameters are enclosed in the file ./config/config_trainer_estimation.ini.sample, and are explained here:

Configuration parameters guide

INFERENCE section:

data_type: can be timeseries or features and stands for the type of the data to train the model. If the datatype is features, the next two parameters are ignored.
num_forecasts: Number of values that the system will output when a prediction is requested, when the data_type is timeseries.
num_previous_measures: This parameter affects the learning algorithm itself, it stands for the length of the time series, which is the number of previous values considered for the learning.

ML_INITIAL section:

initial_train: Perform an initial training with all the data available, if there is no previous model to load (true/false)
time_window_to_initial: Period of time for looking back for data in elasticsearch for the initial training. One year is 1y

ML section:

algorithm: Name of the algorithm to be used as Neural Network, currently 'lstm' (default), 'phasedlstm' and 'dense' models are available.
learning_rate: Float number indicating the learning rate for the model. It is recommended to start with a small number as 0.001.
epoch_internal: Number of times the full training set is passed though the neural network for a specific batch size. Default is 50
epoch_external: Number of times the batch_size is increased and the neural network is retrained (epoch_internal times). Default is 1
batch_size: Number of examples or size of the batches in which each epoch_internal is divided. Default is 200 (increases with each epoch_external)
initial_validation_split: Percentage of validation examples used to test the model when training (for the optimizing function). Default is 0.05
validation_split_multiplayer: Multiplier of the validation split used in each epoch_external. Default is 1.75
batch_size_multiplier: Multiplier of the size of the batches for every epoch external. Default is 1.5
minimum_samples: Minimum number of samples to train the system
training_size: Number of samples which will be used to train the model
time_window_to: Period of time for looking back for data in elasticsearch. One week is 1w. One month would be 1M
time_window_from: Time to start looking back to for data in elasticsearch. Default is now
resample: Resample yes/no
resample_time: Can be empty - Delta time between measures used to resample the timeseries. This will put data every resample_time SECONDS (if there is no data, previous value is used)
delta_max_std: Maximum time in SECONDS between measures to consider the time series as synchronous

The format for time_window parameters follows Date Math from elasticsearch API.

ELASTIC section:

index_entities: Name of the index that stores the entities
index_data: Name of the index that stores the data
index_scalers: Name of the index that stores the scalers
index_predictions: Name of the index that stores the predictions
index_models: Name of the index that stores the models. It is overwritten by the environment variable TRAINING_RESULTS_ID
mapping_data: Name of the mapping that defines the format in the index of data
mapping_entities: Name of the mapping that defines the format in the index of entities
mapping_models: Name of the mapping that defines the format in the index of models
mapping_predictions: Name of the mapping that defines the format in the index of predictions

DATA section:

time_index: Column label used for indexing data (timestamp column), typically timestamp
inference_features: Name of the feature(s) that is/are going to be forecasted
dataset_features: Name of the other features used only for training

ELK section

elastic_host: Hostname of elasticsearch. Example: localhost
elastic_port: Port of communication with elasticsearch. Example: 9200 (default of elasticsearch)

MINIO section

minio_host: Hostname of MINIO. Example: minio
minio_port: Port of communication with MINIO (default is 9000)
minio_access: Access key (username) configured in MINIO
minio_secret: Secret key (password) configured in MINIO

Instructions

Use this command to launch the container:

docker run -e ELASTIC_HOST=[$ELASTIC_HOST] -e ELASTIC_PORT=[$ELASTIC_PORT] -e MINIO_ACCESS_KEY=[$MINIO_ACCESS_KEY] -e MINIO_SECRET_KEY=[$MINIO_SECRET_KEY] -e MINIO_SERVICE_HOST=[$MINIO_HOST] -e MINIO_SERVICE_PORT=[$MINIO_PORT] -e TRAINING_RESULTS_ID=[$TRAINING_RESULTS_ID] -e INPUT_FEATURES=[$INPUT_FEATURES] -e PREDICTION_FEATURES=[$PREDICTION_FEATURES] -v ./config/:/usr/app/src/config --name trainer easierai/trainer:1.0

Apart from the basic environment variables, you can perform a more advanced configuration by overriding the configuration file inside the trainer (./config/config.ini to the docker file: /usr/app/src/config/config.ini) by passing a volume to the docker image (notice that the volume is a folder named config in which there is a file named config.ini) adding the tag -v to the docker command.

Notice:

Variables MINIO_ACCESS_KEY and MINIO_SECRET_KEY are, respectively, the username and password of the MINIO service deployed, check the configuration of this service to know more.
Appart from those variable, you must specify at least @tag (for example 1.1) and the folder that contains the configuration file as a volume. Make sure that inside the folder ./config there should be a file called config.ini with the configuration file previously explained.
In addition, you should specify the environment variables for the elasticsearch host, the elasticsearch port and minio host and port. If you do not provide them, the ones in the config.ini file will be used. You should also open the port used as REST API and make sure you use a different port than the inferencer if you plan to launch both on the same machine. As you can see, there are a few more variables you need to configure:
ELASTIC_HOST: elasticsearch host IP or hostname.
ELASTIC_PORT: elasticsearch port.
MINIO_SERVICE_HOST: MINIO host IP or hostname.
MINIO_SERVICE_PORT: MINIO port.
MINIO_ACCESS: MINIO access key (username for the MINIO repository)
MINIO_SECRET: MINIO secret key (password for the MINIO repository)

You can also add this piece of code in your docker-compose file:

  trainer:
    image: easierai/trainer:1.0
    container_name: trainer
    environment:
      NODE_ENV: development
      ELASTIC_HOST: 127.0.0.1
      ELASTIC_PORT: 9200
      MINIO_SERVICE_HOST: 127.0.0.1
      MINIO_SERVICE_PORT:9000
      MINIO_ACCESS: username
      MINIO_SECRET: password
      TRAINING_RESULTS_ID: experiment-001
      INPUT_FEATURES: ratio,free
      INFERENCE_FEATURES: ratio

FAQs

What is easierai-trainer-library?

Is easierai-trainer-library well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

easierai-trainer-library

AI EASIER.AI Trainer Library

Training the models

Configuration

Configuration parameters guide

Instructions

Related posts

Typosquatted Go Packages Deliver Malware Loader Targeting Linux and macOS Systems

Bybit Hack Puts Crypto Losses at $1.6B, Surpassing All of Last Year in Just Two Months