

Motivation
Data augmentation
is a technique commonly used for training machine learning models in the
computer vision field, where one can increase the amount of image data by
creating transformed copies of the original images.
In the object detection sub-field, the transformation has to be done also
to the target rectangular bounding-boxes. However, such functionality is not
readily available in frameworks such as TensorFlow and PyTorch.
While there are other powerful augmentation tools available, many of those
do not work well with the
TPU
when accessing from Google Colab or
Kaggle Notebooks,
which are popular options nowadays for a lot of people who do not have their
own hardware resources.
Here comes Targetran to fill the gap.
What is Targetran?
- A light-weight data augmentation library to assist object detection or
image classification model training.
- Has simple Python API to transform both the images and the target rectangular
bounding-boxes.
- Use dataset-idiomatic approach for TensorFlow and PyTorch.
- Can be used with the TPU for acceleration (TensorFlow Dataset only).

(Figure produced by the example code here.)
Table of contents
Installation
Tested for Python 3.9, 3.10, and 3.11.
The best way to install Targetran with its dependencies is from PyPI:
python3 -m pip install --upgrade targetran
Alternatively, to obtain the latest version from this repository:
git clone https://github.com/bhky/targetran.git
cd targetran
python3 -m pip install .
Usage
Notations
NDFloatArray: NumPy float array type. The values are converted to np.float32 internally.
tf.Tensor: General TensorFlow Tensor type. The values are converted to tf.float32 internally.
Data format
For object detection model training, which is the primary usage here, the following data are needed.
image_seq (Sequence of NDFloatArray or tf.Tensor of shape (height, width, num_channels)):
- images in channel-last format;
- image sizes can be different.
bboxes_seq (Sequence of NDFloatArray or tf.Tensor of shape (num_bboxes_per_image, 4)):
- each
bboxes array/tensor provides the bounding-boxes associated with an image;
- each single bounding-box is given as
[top_left_x, top_left_y, bbox_width, bbox_height];
- empty array/tensor means no bounding-boxes (and labels) for that image.
labels_seq (Sequence of NDFloatArray or tf.Tensor of shape (num_bboxes_per_image,)):
- each
labels array/tensor provides the bounding-box labels associated with an image;
- empty array/tensor means no labels (and bounding-boxes) for that image.
Some dummy data are created below for illustration. Please note the required format.
import numpy as np
image_seq = [np.random.rand(480, 512, 3) for _ in range(3)]
bboxes_seq = [
np.array([
[214, 223, 10, 11],
[345, 230, 21, 9],
]),
np.array([]),
np.array([
[104, 151, 22, 10],
[99, 132, 20, 15],
[340, 220, 31, 12],
]),
]
labels_seq = [
np.array([0, 1]),
np.array([]),
np.array([2, 3, 0]),
]
Design principles
- Bounding-boxes will always be rectangular with sides parallel to the image frame.
- After transformation, each resulting bounding-box is determined by the smallest
rectangle (with sides parallel to the image frame) enclosing the original transformed bounding-box.
- After transformation, resulting bounding-boxes with their centroids outside the
image frame will be removed, together with the corresponding labels.
TensorFlow Dataset
import tensorflow as tf
from targetran.tf import (
to_tf_dataset,
TFCombineAffine,
TFRandomFlipLeftRight,
TFRandomFlipUpDown,
TFRandomRotate,
TFRandomShear,
TFRandomTranslate,
TFRandomCrop,
TFResize,
)
ds = to_tf_dataset(image_seq, bboxes_seq, labels_seq)
ds = to_tf_dataset(image_paths, bboxes_seq, labels_seq, image_seq_is_paths=True)
affine_transform = TFCombineAffine(
[TFRandomRotate(probability=0.8),
TFRandomShear(probability=0.6),
TFRandomTranslate(),
TFRandomFlipLeftRight(),
TFRandomFlipUpDown()],
probability=1.0
)
affine_transform = TFCombineAffine(
[TFRandomRotate(),
TFRandomShear(),
TFRandomTranslate(),
TFRandomFlipLeftRight(),
TFRandomFlipUpDown()],
num_selected_transforms=2,
selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0],
keep_order=True,
probability=1.0
)
auto_tune = tf.data.AUTOTUNE
ds = (
ds
.map(TFRandomCrop(probability=0.5), num_parallel_calls=auto_tune)
.map(affine_transform, num_parallel_calls=auto_tune)
.map(TFResize((256, 256)), num_parallel_calls=auto_tune)
)
ds = ds.padded_batch(batch_size=2, padding_values=-1.0)
Using with KerasCV
The KerasCV API is a little bit confusing
in terms of its input data format. The requirement is different between
a preprocessing layer and a model.
Targetran provides easy conversion tools to make the process smoother.
import keras_cv
from targetran.tf import to_keras_cv_dict, to_keras_cv_model_input
ds = to_keras_cv_dict(ds, batch_size=2)
jittered_resize = keras_cv.layers.JitteredResize(
target_size=(640, 640),
scale_factor=(0.8, 1.25),
bounding_box_format="xywh",
)
ds = ds.map(jittered_resize)
ds = to_keras_cv_model_input(ds)
PyTorch Dataset
from typing import Optional, Sequence, Tuple
import numpy.typing
from torch.utils.data import Dataset
from targetran.np import (
CombineAffine,
RandomFlipLeftRight,
RandomFlipUpDown,
RandomRotate,
RandomShear,
RandomTranslate,
RandomCrop,
Resize,
)
from targetran.utils import Compose
NDFloatArray = numpy.typing.NDArray[numpy.float_]
class PTDataset(Dataset):
"""
A very simple PyTorch Dataset.
As per common practice, transforms are done on NumPy arrays.
"""
def __init__(
self,
image_seq: Sequence[NDFloatArray],
bboxes_seq: Sequence[NDFloatArray],
labels_seq: Sequence[NDFloatArray],
transforms: Optional[Compose]
) -> None:
self.image_seq = image_seq
self.bboxes_seq = bboxes_seq
self.labels_seq = labels_seq
self.transforms = transforms
def __len__(self) -> int:
return len(self.image_seq)
def __getitem__(
self,
idx: int
) -> Tuple[NDFloatArray, NDFloatArray, NDFloatArray]:
if self.transforms:
return self.transforms(
self.image_seq[idx],
self.bboxes_seq[idx],
self.labels_seq[idx]
)
return (
self.image_seq[idx],
self.bboxes_seq[idx],
self.labels_seq[idx]
)
affine_transform = CombineAffine(
[RandomRotate(probability=0.8),
RandomShear(probability=0.6),
RandomTranslate(),
RandomFlipLeftRight(),
RandomFlipUpDown()],
probability=1.0
)
affine_transform = CombineAffine(
[RandomRotate(),
RandomShear(),
RandomTranslate(),
RandomFlipLeftRight(),
RandomFlipUpDown()],
num_selected_transforms=2,
selected_probabilities=[0.5, 0.0, 0.3, 0.2, 0.0],
keep_order=True,
probability=1.0
)
transforms = Compose([
RandomCrop(probability=0.5),
affine_transform,
Resize((256, 256)),
])
ds = PTDataset(image_seq, bboxes_seq, labels_seq, transforms=transforms)
from torch.utils.data import DataLoader
from targetran.utils import collate_fn
data_loader = DataLoader(ds, batch_size=2, collate_fn=collate_fn)
Image classification
While the tools here are primarily designed for object detection tasks, they can
also be used for image classification in which only the images are to be transformed,
e.g., given a dataset that returns (image, label) examples, or even only image examples.
The image_only function can be used to convert a transformation class for this purpose.
If the dataset returns a tuple (image, ...) in each iteration, only the image
will be transformed, other parameters that followed such as (..., label, weight)
will be returned untouched.
If the dataset returns image only (not a tuple), then only the transformed image will be returned.
from targetran.utils import image_only
ds = to_tf_dataset(image_seq)
ds = (
ds
.map(image_only(TFRandomCrop()))
.map(image_only(affine_transform))
.map(image_only(TFResize((256, 256))))
.batch(32)
)
transforms = Compose([
image_only(RandomCrop()),
image_only(affine_transform),
image_only(Resize((256, 256))),
])
ds = PTDataset(..., transforms=transforms)
data_loader = DataLoader(ds, batch_size=32)
Examples
API
See here for API details.