Dataset Insights
Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception package.
Installation
Datasetinsights is published to PyPI. You can simply run pip install datasetinsights
command under a supported python environments:
Getting Started
Dataset Statistics
We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.
Load Datasets
The Unity Perception package provides datasets under this schema. The datasetinsighs package also provide convenient python modules to parse datasets.
For example, you can load AnnotationDefinitions
into a python dictionary by providing the corresponding annotation definition ID:
from datasetinsights.datasets.unity_perception import AnnotationDefinitions
annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")
Similarly, for MetricDefinitions
:
from datasetinsights.datasets.unity_perception import MetricDefinitions
metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")
The Captures
table provide the collection of simulation captures and annotations. You can load these records directly as a Pandas DataFrame
:
from datasetinsights.datasets.unity_perception import Captures
captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")
The Metrics
table can store simulation metrics for a capture or annotation. You can also load these records as a Pandas DataFrame
:
from datasetinsights.datasets.unity_perception import Metrics
metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")
Download Datasets
You can download the datasets using the download command:
datasetinsights download --source-uri=<xxx> --output=$HOME/data
The download command supports HTTP(s), and GCS.
Alternatively, you can download dataset directly from python interface.
GCSDatasetDownloader
can download a dataset from GCS locations.
from datasetinsights.io.downloader import GCSDatasetDownloader
source_uri=gs://url/to/file.zip
dest = "~/data"
downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)
HTTPDatasetDownloader
can a dataset from any HTTP(S) url.
from datasetinsights.io.downloader import HTTPDatasetDownloader
source_uri=http://url.to.file.zip
dest = "~/data"
downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)
Convert Datasets
If you are interested in converting the synthetic dataset to COCO format for
annotations that COCO supports, you can run the convert
command:
datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Instances
or
datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Keypoints
You will need to provide 2D bounding box definition ID in the synthetic dataset. We currently only support 2D bounding box and human keypoint annotations for COCO format.
Docker
You can use the pre-build docker image unitytechnologies/datasetinsights to interact with datasets.
Documentation
You can find the API documentation on readthedocs.
Contributing
Please let us know if you encounter a bug by filing an issue. To learn more about making a contribution to Dataset Insights, please see our Contribution page.
License
Dataset Insights is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Citation
If you find this package useful, consider citing it using:
@misc{datasetinsights2020,
title={Unity {D}ataset {I}nsights Package},
author={{Unity Technologies}},
howpublished={\url{https://github.com/Unity-Technologies/datasetinsights}},
year={2020}
}