Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
DataGradients is an open-source python based library designed for computer vision dataset analysis.
Extract valuable insights from your datasets and get comprehensive reports effortlessly.
Non-exhaustive list of supported features.
📘 Deep Dive into Data Profiling
Puzzled by some dataset challenges while using DataGradients? We've got you covered.
Enrich your understanding with this 🎓free online course. Dive into dataset profiling, confront its complexities, and harness the full potential of DataGradients.
Check out the pre-computed dataset analysis for a deeper dive into reports.
You can install DataGradients directly from the GitHub repository.
pip install data-gradients
class_id
-> class_name
.Please ensure all the points above are checked before you proceed with DataGradients.
Example
from torchvision.datasets import CocoDetection
train_data = CocoDetection(...)
val_data = CocoDetection(...)
class_names = ["person", "bicycle", "car", "motorcycle", ...]
# OR
# class_names = {0: "person", 1:"bicycle", 2:"car", 3: "motorcycle", ...}
Good to Know - DataGradients will try to find out how the dataset returns images and labels.
- If something cannot be automatically determined, you will be asked to provide some extra information through a text input.
- In some extreme cases, the process will crash and invite you to implement a custom dataset extractor
Heads up - DataGradients provides a few out-of-the-box dataset/dataloader implementation. You can find more dataset implementations in PyTorch or SuperGradients.
You are now ready to go, chose the relevant analyzer for your task and run it over your datasets!
Image Classification
from data_gradients.managers.classification_manager import ClassificationAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = ClassificationAnalysisManager(
report_title="Testing Data-Gradients Classification",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
Object Detection
from data_gradients.managers.detection_manager import DetectionAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = DetectionAnalysisManager(
report_title="Testing Data-Gradients Object Detection",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
Semantic Segmentation
from data_gradients.managers.segmentation_manager import SegmentationAnalysisManager
train_data = ... # Your dataset iterable (torch dataset/dataloader/...)
val_data = ... # Your dataset iterable (torch dataset/dataloader/...)
class_names = ... # [<class-1>, <class-2>, ...]
analyzer = SegmentationAnalysisManager(
report_title="Testing Data-Gradients Segmentation",
train_data=train_data,
val_data=val_data,
class_names=class_names,
)
analyzer.run()
Example
You can test the segmentation analysis tool in the following example which does not require you to download any additional data.
Once the analysis is done, the path to your pdf report will be printed. You can find here examples of pre-computed dataset analysis reports.
The feature configuration allows you to run the analysis on a subset of features or adjust the parameters of existing features. If you are interested in customizing this configuration, you can check out the documentation on that topic.
Ensuring Comprehensive Dataset Compatibility
DataGradients is adept at automatic dataset inference; however, certain specificities, such as nested annotations structures or unique annotation format, may necessitate a tailored approach.
To address this, DataGradients offers extractors
tailored for enhancing compatibility with diverse dataset formats.
For an in-depth understanding and implementation details, we encourage a thorough review of the Dataset Extractors Documentation.
Example notebook on Colab |
Click here to join our Discord Community |
This project is released under the Apache 2.0 license.
FAQs
DataGradients
We found that data-gradients demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.