New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

irisml

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

irisml

Simple ML pipeline platform

0.0.38

PyPI

Maintainers: 1

IrisML

Proof of Concept for a simple framework to create a ML pipeline.

Features

Run a ML training/inference with a simple JSON configuration.
Modularized interfaces for task components.
Cache task outputs for faster experiments.

Getting started

Installation

Prerequisite: python 3.8+

# Install the core framework and standard tasks.
pip install irisml irisml-tasks irisml-tasks-training

Run an example job

# Install additional packages that are required for the example
pip install irisml-tasks-torchvision

# Run on local machine
irisml_run docs/examples/mobilenetv2_mnist_training.json

Available commands

# Run the specified pipeline. You can provide environment variables by "-e" option, which will be acceible through $env variable in the json config.
irisml_run <pipeline_json_path> [-e <ENV_NAME>=<env_value>] [--no_cache] [--no_cache_read] [-v]

# Show information about the specified task. If <task_name> is not provided, shows a list of available tasks in the current environment.
irisml_show [<task_name>]

# Manage a cache storage on Azure Blob Storage
# list - Show a list of matched blobs.
# download - Download matched blobs.
# remove - Remove matched blobs.
# show - Show the contents of matched blobs.
irisml_cache <list|download|remove|show> [--mtime <+|->N] [--name NAME]

Pipeline definition

PipelineDefinition = {"tasks": List[TaskDefinition], "on_error": Optional[List[TaskDescription]]}

TaskDefinition = {
    "task": <task module name>,
    "name": <optional unique name of the task>,
    "inputs": <list of input objects>,
    "config": <config for the task. Use irisml_show command to find the available configurations.>
}

In the TaskDefinition.inputs and TaskDefinition.config, you cna use the following two variable.

$env.<variable_name> This variable will be replaced by the environment variable that was provided as arguments for irisml_run command.
$outputs.<task_name>.<field_name> This variable will be replaced by the outputs of the specified previous task.

It raises an exception on runtime if the specified variable was not found.

If a task raised an exception, the tasks specified in on_error field will be executed. The exception object will be assigned to "$env.IRISML_EXCEPTION" variable.

Patch definition (Experimental)

PatchesDefinition = {"patches": List[PatchDefinition], "patches_on_error": List[PatchDefinition]}  # At least one of the fields must be specified.

PatchDefinition = {  # One of the filtering conditions and one of the actions must be specified.
    # Filtering conditions
    "match": List[MatchCondition],
    "match_if_exists": List[MatchCondition],  # Matches the task if it exists. If not, the patch will be ignored.
    "match_oneof": List[MatchCondition],  # Matches the first task that matches one of the conditions.
    "top": bool,  # Matches the top of the pipeline. Used with "insert" action.
    "bottom": bool,  # Matches the bottom of the pipeline. Used with "insert" action.

    # Actions
    "insert": List[TaskDefinition],
    "remove": bool,
    "replace": Tuple[List[TaskDefinition], Dict[str, str]], # The second element is a mapping from the old output name to the new output name. All "$output" variables will be replaced by the new output name.
    "update": TaskDefinition
}

MatchCondition = {  # All fields are optional.
    "task": str,
    "name": str,
    "config": Dict[str, Any]
}

The available actions are as follows:

insert: Insert the specified tasks after the matched task.
remove: Remove the matched task.
replace: Replace the matched task with the specified tasks.
update: Update the matched task with the given configuration.

Note that the patch command doesn't guarantee the correctness of the patched pipeline. It is recommended to validate the patched pipeline.

Pipeline cache

Using cache, you can modify and re-run a pipeline config with minimum cost. If the cache is enabled, IrisML will calculate hash values for all task inputs/configs and upload the task outputs to a specified storage. When it found a task with same hash values, it can download the cache and skip the task execution.

To enable cache, you must specify the cache storage location by setting IRISML_CACHE_URL environment variable. Currently Azure Blob Storage and local filesystem is supported.

To use Azure Blob Storage, a container URL must be provided. It the URL contains a SAS token, it will be used for authentication. Otherwise, interactive authentication and Managed Identity authentication will be used.

Python API

To run a pipeline from python code, you can use the following APIs.

import json
import pathlib
from irisml.core import JobRunner

job_description = json.loads(pathlib.Path('example.json').read_text())
runner = JobRunner(job_description)

runner.run({'DATASET_NAME': 'mnist'})

runner.run({'DATASET_NAME': 'cifar10'})

Available official tasks

To show the detailed help for each task, run the following command after installing the package.

irisml_show <task_name>

irisml-tasks

Task	Description
assertion	Assert the given input.
assign_class_to_strings	Assigns a class to a string based on the class name being present in the string.
branch	'If' conditional branch.
calculate_cosine_similarity	Calculate cosine similarity between two sets of vectors.
check_model_parameters	Check Inf/NaN values in model parameters.
compare	Compare two values
compare_ints	Compare two int values.
convert_detection_to_multilabel	Convert targets or predictions of object detection to multilabel.
convert_string_to_string_list	Convert a string to a list of strings.
deserialize_tensor	Deserialize a pytorch tensor.
divide_float	Floating point division.
download_azure_blob	Download a single blob from Azure Blob Storage.
emulate_fp8_quantization	Emulate FP8 quantization.
extract_image_bytes_from_dataset	Extract images from a dataset and convert them to bytes.
get_current_time	Get the current time in seconds since the epoch
get_dataset_split	Get a train/val split of a dataset.
get_dataset_stats	Get statistics of a dataset.
get_dataset_subset	Get a subset of a dataset.
get_fake_image_classification_dataset	Generate a fake image classification dataset.
get_fake_image_text_classification_dataset	Generate a fake image-text classification dataset.
get_fake_object_detection_dataset	Generate a fake object detection dataset.
get_fake_phrase_grounding_dataset	Generate a fake phrase grounding dataset.
get_fake_visual_question_answering_dataset	Generate a fake visual question answering dataset.
get_int_from_json_strings	Get an integer from a JSON string.
get_int_list_from_json_strings	Get a list of ints from a JSON string.
get_item	Get an item from the given list.
get_key_and_int_list_from_json_string	Parse a JSON string and return a list of keys and a list of lists of ints.
get_kfold_cross_validation_dataset	Get train/test dataset for k-fold cross validation.
get_secret_from_azure_keyvault	Get a secret from Azure KeyVault.
get_topk	Get the largest Topk values and indices.
join_filepath	Join a given dir_path and a filename.
join_two_strings	Join two strings to one string.
load_coco_detections	Load coco detections from a JSON to a list of tensors.
load_float_tensor_jsonl	Load a 2D float tensor from a JSONL file.
load_state_dict	Load a state_dict from various sources.
load_str_list_jsonl	Load a list of strings from a JSONL file.
load_strs_from_json_file	Load strings from a JSON file.
load_tensor_list	Load a list of tensors from file.
make_cached_dataset	Save dataset cache on disk.
make_prompt_for_each_string	Make a prompt for each string.
make_prompt_list_with_strings	Make a list of prompts from a template and a list of strings.
make_prompt_with_strings	Make a prompt with a list of strings.
make_random_choice_text_transform	Make a text transform function that randomly chooses one of the substrings separated by the delimiter.
make_text_transform	Make a text transform function.
map_int_list	Map a list of integers to a list of integers.
pickling_object	Pickling an object.
print	Print or Pretty Print the input object.
print_environment_info	Print various environment information to stdout/stderr.
read_file	Reads a file and returns its contents as bytes.
repeat_tasks	Repeat the given tasks for multiple times.
run_parallel	Run the given tasks in parallel. A new process will be forked for each task. Each task must have an unique name.
run_profiler	Run profiler on the given tasks.
run_sequential	Run the given tasks in sequence. Each task must have an unique name.
save_file	Save the given input binary to a file.
save_float_tensor_jsonl	Save a 2D float tensor to a JSONL file.
save_images_from_dataset	Save images from a dataset to disk.
save_jit_model	Save an offline version of a pytorch model. torch.jit.save()
save_state_dict	Save the model's state_dict to the specified file.
save_str_list_jsonl	Save a list of strings to a JSONL file.
search_grid_sequential	Grid search hyperparameters. Tasks are run in sequence.
serialize_tensor	Serialize a pytorch tensor.
split_string	Split string to a list of strings.
switch_pick	pick from vals based on conditions. Task will return the first val with condition being True.
upload_azure_blob	Upload a binary file to Azure Storage Blob.
upload_azure_blob_directory	Upload a directory to Azure Blob Storage.

irisml-tasks-training

This package contains tasks related to pytorch training.

Task	Description
append_classifier	Append a classifier model to a given model. A predictor and a loss module will be added, too.
benchmark_dataset	Benchmark dataset loading and preprocessing
benchmark_model	Benchmark a given model using a given dataset.
benchmark_model_with_grad_cache	Benchmark a given model using a given dataset with grad caching. Useful for cases which require sub batching.
build_classification_prompt_dataset	Create a classification prompt dataset.
build_zero_shot_classifier	Create a zero-shot classification layer.
concatenate_datasets	Concatenate the given two datasets together.
convert_vqa_dataset_to_image_text_classification_dataset	Convert VQA dataset to image text classification dataset.
create_classification_prompt_generator	Create a prompt generator for a classification task.
create_prompt_generator	Create a prompt generator that returns a list of prompts for a given label.
evaluate_accuracy	Calculate accuracy of the given prediction results.
evaluate_captioning	Evaluate captioning prediction results.
evaluate_detection_average_precision	Calculate mean average precision for object detection task results.
evaluate_phrase_grounding	Calculate precision/recall for phrase grounding.
evaluate_phrase_grounding_recall	Calculate recall for phrase grounding.
evaluate_string_matching_accuracy	Calculate accuracy of string matching.
exclude_negative_samples_from_classification_dataset	Exclude negative samples from classification dataset.
export_coco_from_torch_dataset	Export coco dataset from a given torch dataset. Support IC and OD only.
export_onnx	Export the given model as ONNX.
extract_val_by_key_from_jsonl	Extract value for each entry in a JSONL by a key.
find_incorrect_classification_indices	Find incorrect classification indices.
find_incorrect_classification_multilabel_indices	Find incorrect classification indices for multilabel classification.
flatten_captioning_dataset	Flatten a captioning dataset with multiple targets per image into a dataset with a single target per image.
get_questions_from_vqa_dataset	Extracts questions from a VQA dataset.
get_subclass_dataset	Get the sub-dataset with given class ids from a dataset.
get_targets_from_dataset	Extract only targets from a given Dataset.
load_jsonl_vqa_dataset	Load a VQA dataset from a jsonl file.
load_simple_classification_dataset	Load a simple classification dataset from a directory of images and an index file.
make_classification_dataset_from_object_detection	Convert an object detection dataset into a classification dataset.
make_classification_dataset_from_predictions	Make a classification dataset from predictions.
make_detection_dataset_from_predictions	Make a detection dataset from predictions.
make_feature_extractor_model	Make a wrapper model to extract a feature vector from a vision model.
make_fixed_prompt_image_transform	Make a transform function for image and a fixed prompt.
make_fixed_text_dataset	Create a dataset with a list of strings.
make_image_text_contrastive_model	Make a model for image-text contrastive training.
make_image_text_transform	Make a transform function for image-text classification.
make_oversampled_dataset	Make an oversampled dataset.
make_phrase_grounding_image_transform	Make phrase grounding image transform.
make_prompt_list_image_transform	Make a transform function for image and prompt list.
make_vqa_collate_function	Creates a collate_function for Visual Question Answering (VQA) and Phrase Grounding task.
make_vqa_image_transform	Creates a transform function for VQA task.
map_classification_predictions_to_detection	Map classification predictions back to detection predictions or targets.
num_iters_to_epochs	Convert number of iterations to number of epochs. Min value is 1.
predict	Predict using a given model.
remove_empty_images_from_dataset	Remove empty images from dataset.
sample_few_shot_dataset	Few-shot sampling of a IC/OD dataset.
save_jsonl_vqa_dataset	Save a VQA dataset to a JSONL file.
split_image_text_model	Split a image-text model into an image model and a text model.
train	Train a pytorch model.
train_with_gradient_cache	Train a model using gradient cache. Useful for contrastive learning with a large model.

irisml-tasks-azure-computervision

Task	Description
create_azure_computervision_caption_model	Create Azure Computer Vision Caption Model.
create_azure_computervision_classification_model	Create Azure Computer Vision Caption Model.
create_azure_computervision_custom_model	Create a model that run inference with a custom model in Azure Computer Vision.
create_azure_computervision_ocr_model	Create Azure Computer Vision OCR model.
create_azure_computervision_product_recognizer_model	Create a model that run inference with a product recognizer model in Azure Computer Vision.
create_azure_computervision_vectorization_model	Create Azure Computer Vision Vectorization Model.
delete_azure_computervision_custom_model	Delete Azure Computer Vision Custom Model.
train_azure_computervision_custom_model	Train Azure Computer Vision Custom Model.

irisml-tasks-azure-customvision

Task	Description
create_azure_customvision_docker_model	Create a model from an exported Azure Custom Vision Docker image.
create_azure_customvision_model	Create a prediction model from an Azure Custom Vision project.
create_azure_customvision_project	Create a new Azure Custom Vision project.
delete_azure_customvision_project	Delete an Azure Custom Vision project
export_azure_customvision_model	Export a model from an Azure Custom Vision project.
train_azure_customvision_project	Train an Azure Custom Vision project.

irisml-tasks-azure-openai

Task	Description
call_azure_openai_completion	Call Azure OpenAI Text Completion API.
create_azure_openai_chat_model	Create a model that generates text using Azure OpenAI completion API.
create_azure_openai_completion_model	Create a model that generates text using Azure OpenAI completion API.

irisml-tasks-azureml

Task	Description
run_azureml_child	Run tasks as a new child AzureML Run.

irisml-tasks-fiftyone

Task	Description
launch_fiftyone	Launch a fiftyone app.

irisml-tasks-llava

Task	Description
create_llava_model	Create a LLaVA model from a pretrained weights.

irisml-tasks-onnx

Adapter tasks for OnnxRuntime library.

Task	Description
benchmark_onnx	Bencharmk a given onnx model using onnxruntime.
predict_onnx	Predict using a given onnx model traced with the export_onnx task

irisml-tasks-timm

Adapter for models in timm library.

Task	Description
create_timm_model	Create a timm model.
create_timm_transform	Create timm transforms.

irisml-tasks-torchmetrics

Adapter tasks for torchmetrics library.

Task	Description
evaluate_torchmetrics_classification_multiclass	Evaluate predictions results using torchmetrics classification metrics for multiclass classification problems.
evaluate_torchmetrics_classification_multilabel	Evaluate predictions results using torchmetrics classification metrics for multilabel classification problems.

irisml-tasks-torchvision

Adapter tasks for torchvision library.

Task	Description
create_torchvision_model	Create a torchvision model.
create_torchvision_transform	Create transform objects in torchvision library.
create_torchvision_transform_v2	Create torchvision transform v2 object from string expressions.
load_torchvision_dataset	Load a dataset from torchvision package.

irisml-tasks-transformers

Adapter tasks for HuggingFace transformers library.

Task	Description
cache_transformers_model_on_azure_blob	Cache a model from transformers on Azure Blob Storage.
create_transformers_model	Create a model using transformers library.
create_transformers_raw_tokenizer	Create a Tokenizer using transformers library. Return the tokenizer as-is.
create_transformers_text_model	Create a text-generation model using transformers library.
create_transformers_tokenizer	Create a Tokenizer using transformers library.

Development

Create a new task

To create a Task, you must define a module that contains a "Task" class. Here is a simple example:

# irisml/tasks/my_custom_task.py
import dataclasses
import irisml.core

class Task(irisml.core.TaskBase):  # The class name must be "Task".
  VERSION = '1.0.0'
  CACHE_ENABLED = True  # (default: True) This is optional.

  @dataclasses.dataclass
  class Inputs:  # You can remove this class if the task doesn't require inputs.
    int_value: int
    float_value: float

  @dataclasses.dataclass
  class Config:  # If there is no configuration, you can remove this class. All fields must be JSON-serializable.
    another_float: float
    child_dataclass: dataclass  # If you'd like to define a nested config, you can define another dataclass.

  @dataclasses.dataclass
  class Outputs:  # Can be removed if the task doesn't have outputs.
    float_value: float = 0  # If dry_run() is not implemented, Outputs fields must have default value or default factory.

  def execute(self, inputs: Inputs) -> Outputs:
    return self.Outputs(inputs.int_value * inputs.float_value * self.config.another_float)

  def dry_run(self, inputs: Inputs) -> Outputs:  # This method is optional.
    return self.Outputs(0)  # Must return immediately without actual processing.

Each Task must define "execute" method. The base class has empty implementation for Inputs, Config, Outputs and dry_run(). For the detail, please see the document for TaskBase class.

FAQs

What is irisml?

Is irisml well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

irisml

IrisML

Features

Getting started

Installation

Run an example job

Available commands

Pipeline definition

Patch definition (Experimental)

Pipeline cache

Python API

Available official tasks

Development

Create a new task

Related repositories

Related posts