The Datagen SDK Python Package
About Datagen
Datagen is powering the AI revolution by providing high-performance synthetic data, with a focus on data for human-centric computer vision applications.
Datagen provides a powerful platform that allows you to generate high-quality and high variance, domain-specific, simulated synthetic data, granting you the ability to simulate dynamic humans and objects in their context. With Datagen, CV engineers have unparalleled flexibility to control visual outcomes across a broad spectrum of 3D environments.
About this package
Datagen has developed a set of Python-based user tools for accessing and loading the rendered images and metadata that you generate on our platform. This package is designed to load our synthetic data into your Python environment, whether as part of your training set or as part of your testing set.
The abstraction layer we designed gives you access to all of the modalities we offer: facial landmarks, eye gaze vectors, depth and normal maps, and more.
Installation
Before installing this package, we recommend that you create a new Conda environment using the following command:
conda create --name datagen-env
Next, activate the environment using this command:
conda activate datagen-env
Finally, use this command to install the Datagen package itself:
pip install datagen-tech
Using This Package
This package contains Python objects and functions that are designed to process datasets generated with the Datagen Platform. If you are a Datagen customer, you have already been given access to sample datasets and jupyter notebook tutorials. If you are not a Datagen customer, contact our support team at support@datagen.tech.
Importing the SDK
To import the sdk, use the command import datagen
or, if you prefer, import datagen as dg
. For the rest of this tutorial we will use the latter.
Datasets
A "dataset" is any collection of datapoints that were generated on the Datgen platform. Each datapoint is a single visual spectrum image accompanied by visual and textual metadata.
Loading a dataset
The dg.load()
command stores one or more datasets into a Dataset
object. It can accept any number of target folders at once, merging them into one large dataset so you can conduct operations on all of them together.
This command is fully backwards-compatible. If you use the platform's newest features in your dataset, you can still load that dataset together with older datasets that predate those features.
When using this command, you should target the top-level folder of each dataset (the folder that contains one or more subfolders named "scene"). For example:
dataset = dg.load("/path/to/datagen/datasets/eye-gaze-forward", "/path/to/datagen/datasets/expression-smiles")
The above line of sample code loads a dataset found in a folder named "eye-gaze-forward" as well as a dataset found in a folder named "expression-smiles", merging their datapoints into a single dataset.
Working with your dataset
You can iterate over your dataset in three different ways: by scene, by camera, or by datapoint.
Scenes
What is a scene?
A "scene" refers to a 3D environment that contains one or more 3D objects. For example, each scene that is created in our Faces generator consists of a single human face, wearing a specific expression, and located in a static position.
When generating a dataset on our Faces platform, you have the option of rendering each scene in multiple ways: through more than one camera, each with different settings; and under more than one lighting scenario, each with different backgrounds. Therefore, depending on the settings you used when you generated this dataset, each scene can be depicted in anywhere between one and over 30 rendered images.
The Scene
object
Each dataset contains an iterable set of Scene
objects. To retrieve this set, use dataset.scenes
. To refer to an individual scene in a dataset, use dataset.scenes[0]
.
The Scene
object contains a set of rendered images that all depict the same subject, but from different angles and under different lighting scenarios. Each of these images, along with its metadata, is called a "datapoint". You can access a scene's datapoints by subscription:
scene = dataset.scenes[0]
datapoint = scene[0]
Or by iterating over the scene:
for datapoint in scene:
...
Note: In older datasets generated by our platform, scenes were previously named "environments". The package supports both formats.
Cameras
What is a camera?
A "camera" refers to a set of rendered images that were taken of a specific scene, from a specific angle, under specific camera settings - but each one shows the scene under a different lighting scenario. When you generated this dataset, you selected one or more lighting scenarios; that selection determines the number of rendered images per camera.
The Camera
object
Each scene contains an iterable list of Camera
objects. To retrieve this set, use scenescene.cameras
. To refer to an individual camera in a scene, use scene.cameras[0]
.
The Camera
object contains a set of rendered images that all depict the same subject from the same angle, but under different lighting scenarios. Each of these images, along with its metadata, is called a "datapoint". You can access a camera's datapoints by subscription:
scene = dataset.scenes[0]
camera = scene.cameras[0]
datapoint = camera[0]
Or by iterating over the camera:
for datapoint in camera:
...
Datapoints
What is a datapoint?
A "datapoint" refers to a single rendered image of a synthetic human subject, along with all of its metadata. That metadata includes all of Datagen's modalities: normal and depth maps, facial landmark coordinates, camera and actor metadata, and more.
The Datapoint
object
A Datapoint
object consists of a visual spectrum image annotated by additional visual and textual data files:
For example, each Datapoint
object contains 2D coordinates for facial landmarks, identifying where those landmark can be found in the visual spectrum image.
You can use tab autocompletion to view the full list of available objects in the Datapoint
object. For example, you can use the following command to access the visual spectrum image of the subject:
imshow(datapoint.visible_spectrum)
And this one to access a normal map of the subject:
imshow(datapoint.normals_map)
Textual annotations are organized in a hierarchy. datapoint.actor_metadata.face_expression
gives you the description of the subject's facial expression, while datapoint.semantic_segmentation_metadata.nose
gives you the RGB value that represents the nose in the accompanying semantic segmentation map.