Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

instructlab-training

Package Overview
Dependencies
Maintainers
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

instructlab-training

Training Library

  • 0.6.1
  • PyPI
  • Socket score

Maintainers
3

InstructLab Training Library

Lint Build Release License

e2e-nvidia-l4-x1.yml on main e2e-nvidia-l40s-x4.yml on main

To simplify the process of fine-tuning models with the LAB method, this library provides a simple training interface.

Installing the library

To get started with the library, you must clone this repository and install it via pip.

Install the library:

pip install instructlab-training 

You can then install the library for development:

pip install -e ./training

Additional NVIDIA packages

This library uses the flash-attn package as well as other packages, which rely on NVIDIA-specific CUDA tooling to be installed. If you are using NVIDIA hardware with CUDA, you need to install the following additional dependencies.

Basic install

pip install .[cuda]

Editable install (development)

pip install -e .[cuda]

Using the library

You can utilize this training library by importing the necessary items.

from instructlab.training import (
    run_training,
    TorchrunArgs,
    TrainingArgs,
    DeepSpeedOptions
)

You can then define various training arguments. They will serve as the parameters for your training runs. See:

Learning about training arguments

The TrainingArgs class provides most of the customization options for training jobs. There are a number of options you can specify, such as setting DeepSpeed config values or running a LoRA training job instead of a full fine-tune.

TrainingArgs

FieldDescription
model_pathEither a reference to a HuggingFace repo or a path to a model saved in the HuggingFace format.
data_pathA path to the .jsonl training dataset. This is expected to be in the messages format.
ckpt_output_dirDirectory where trained model checkpoints will be saved.
data_output_dirDirectory where the processed training data is stored (post filtering/tokenization/masking)
max_seq_lenThe maximum sequence length to be included in the training set. Samples exceeding this length will be dropped.
max_batch_lenMaximum tokens per gpu for each batch that will be handled in a single step. Used as part of the multipack calculation. If running into out-of-memory errors, try to lower this value, but not below the max_seq_len.
num_epochsNumber of epochs to run through before stopping.
effective_batch_sizeThe amount of samples in a batch to see before we update the model parameters.
save_samplesNumber of samples the model should see before saving a checkpoint. Consider this to be the checkpoint save frequency.
learning_rateHow fast we optimize the weights during gradient descent. Higher values may lead to unstable learning performance. It's generally recommended to have a low learning rate with a high effective batch size.
warmup_stepsThe number of steps a model should go through before reaching the full learning rate. We start at 0 and linearly climb up to learning_rate.
is_padding_freeBoolean value to indicate whether or not we're training a padding-free transformer model such as Granite.
random_seedThe random seed PyTorch will use.
mock_dataWhether or not to use mock, randomly generated, data during training. For debug purposes
mock_data_lenMax length of a single mock data sample. Equivalent to max_seq_len but for mock data.
deepspeed_optionsConfig options to specify for the DeepSpeed optimizer.
loraOptions to specify if you intend to perform a LoRA train instead of a full fine-tune.
chat_tmpl_pathSpecifies the chat template / special tokens for training.
checkpoint_at_epochWhether or not we should save a checkpoint at the end of each epoch.
fsdp_optionsThe settings for controlling FSDP when it's selected as the distributed backend.
distributed_backendSpecifies which distributed training backend to use. Supported options are "fsdp" and "deepspeed".
disable_flash_attnDisables flash attention when set to true. This allows for training on older devices.

DeepSpeedOptions

This library only currently support a few options in DeepSpeedOptions: The default is to run with DeepSpeed, so these options only currently allow you to customize aspects of the ZeRO stage 2 optimizer.

FieldDescription
cpu_offload_optimizerWhether or not to do CPU offloading in DeepSpeed stage 2.
cpu_offload_optimizer_ratioFloating point between 0 & 1. Specifies the ratio of parameters updating (i.e. optimizer step) on CPU side.
cpu_offload_optimizer_pin_memoryIf true, offload to page-locked CPU memory. This could boost throughput at the cost of extra memory overhead.
save_samplesThe number of samples to see before saving a DeepSpeed checkpoint.

For more information about DeepSpeed, see deepspeed.ai

FSDPOptions

Like DeepSpeed, we only expose a number of parameters for you to modify with FSDP. They are listed below:

FieldDescription
cpu_offload_paramsWhen set to true, offload parameters from the accelerator onto the CPU. This is an all-or-nothing option.
sharding_strategySpecifies the model sharding strategy that FSDP should use. Valid options are: FULL_SHARD (ZeRO-3), HYBRID_SHARD (ZeRO-3*), SHARD_GRAD_OP (ZeRO-2), and NO_SHARD.

[!NOTE] For sharding_strategy - Only SHARD_GRAD_OP has been extensively tested and is actively supported by this library.

loraOptions

LoRA options currently supported:

FieldDescription
rankThe rank parameter for LoRA training.
alphaThe alpha parameter for LoRA training.
dropoutThe dropout rate for LoRA training.
target_modulesThe list of target modules for LoRA training.
quantize_data_typeThe data type for quantization in LoRA training. Valid options are None and "nf4"
Example run with LoRa options

If you'd like to do a LoRA train, you can specify a LoRA option to TrainingArgs via the LoraOptions object.

from instructlab.training import LoraOptions, TrainingArgs

training_args = TrainingArgs(
    lora = LoraOptions(
        rank = 4,
        alpha = 32,
        dropout = 0.1,
    ),
    # ...
)

Learning about TorchrunArgs arguments

When running the training script, we always invoke torchrun.

If you are running a single-GPU system or something that doesn't otherwise require distributed training configuration, you can create a default object:

run_training(
    torchrun_args=TorchrunArgs(),
    training_args=TrainingArgs(
        # ...
    ),
)

However, if you want to specify a more complex configuration, the library currently supports all the options that torchrun accepts today.

[!NOTE] For more information about the torchrun arguments, please consult the torchrun documentation.

Example training run with TorchrunArgs arguments

For example, in a 8-GPU, 2-machine system, we would specify the following torchrun config:

MASTER_ADDR = os.getenv('MASTER_ADDR')
MASTER_PORT = os.getnev('MASTER_PORT')
RDZV_ENDPOINT = f'{MASTER_ADDR}:{MASTER_PORT}'

# on machine 1
torchrun_args = TorchrunArgs(
    nnodes = 2, # number of machines 
    nproc_per_node = 4, # num GPUs per machine
    node_rank = 0, # node rank for this machine
    rdzv_id = 123,
    rdzv_endpoint = RDZV_ENDPOINT
)

run_training(
    torchrun_args=torchrun_args,
    training_args=training_args
)
MASTER_ADDR = os.getenv('MASTER_ADDR')
MASTER_PORT = os.getnev('MASTER_PORT')
RDZV_ENDPOINT = f'{MASTER_ADDR}:{MASTER_PORT}'

# on machine 2
torchrun_args = TorchrunArgs(
    nnodes = 2, # number of machines 
    nproc_per_node = 4, # num GPUs per machine
    node_rank = 1, # node rank for this machine
    rdzv_id = 123,
    rdzv_endpoint = f'{MASTER_ADDR}:{MASTER_PORT}'
)

run_training(
    torch_args=torchrun_args,
    train_args=training_args
)

Example training run with arguments

Define the training arguments which will serve as the parameters for our training run:

# define training-specific arguments
training_args = TrainingArgs(
    # define data-specific arguments
    model_path = "ibm-granite/granite-7b-base",
    data_path = "path/to/dataset.jsonl",
    ckpt_output_dir = "data/saved_checkpoints",
    data_output_dir = "data/outputs",

    # define model-trianing parameters
    max_seq_len = 4096,
    max_batch_len = 60000,
    num_epochs = 10,
    effective_batch_size = 3840,
    save_samples = 250000,
    learning_rate = 2e-6,
    warmup_steps = 800,
    is_padding_free = True, # set this to true when using Granite-based models
    random_seed = 42,
)

We'll also need to define the settings for running a multi-process job via torchrun. To do this, create a TorchrunArgs object.

[!TIP] Note, for single-GPU jobs, you can simply set nnodes = 1 and nproc_per_node=1.

torchrun_args = TorchrunArgs(
    nnodes = 1, # number of machines 
    nproc_per_node = 8, # num GPUs per machine
    node_rank = 0, # node rank for this machine
    rdzv_id = 123,
    rdzv_endpoint = '127.0.0.1:12345'
)

Finally, you can just call run_training and this library will handle the rest 🙂.

run_training(
    torchrun_args=torchrun_args,
    training_args=training_args,
)

Example training with separate data pre-processing

If the machines in the example above have shared storage, users can pre-process the training dataset a single time so that it can then be distributed to each machine by making the following updates.

from instructlab.training import (
    run_training,
    TorchrunArgs,
    TrainingArgs,
    DeepSpeedOptions,
    DataProcessArgs,
    data_process as dp
)

training_args = TrainingArgs(
    # define data-specific arguments
    model_path = "ibm-granite/granite-7b-base",
    data_path = "path/to/dataset.jsonl",
    ckpt_output_dir = "data/saved_checkpoints",
    data_output_dir = "data/outputs",

    # define model-trianing parameters
    max_seq_len = 4096,
    max_batch_len = 60000,
    num_epochs = 10,
    effective_batch_size = 3840,
    save_samples = 250000,
    learning_rate = 2e-6,
    warmup_steps = 800,
    is_padding_free = True, # set this to true when using Granite-based models
    random_seed = 42,
    process_data = True,
)
...

data_process_args = DataProcessArgs(
    data_output_path = training_args.data_output_dir,
    model_path = training_args.model_path,
    data_path = training_args.data_path,
    max_seq_len = training_args.max_seq_len,
    chat_tmpl_path =  training_args.chat_tmpl_path
)

dp.main(data_process_args)
run_training(
    torch_args=torchrun_args,
    train_args=training_args,
)

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc