🚀 Big News:Socket Has Acquired Secure Annex.Learn More
Socket
Book a DemoSign in
Socket

cleanvision

Package Overview
Dependencies
Maintainers
7
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

cleanvision

Find issues in image datasets

Source
pipPyPI
Version
0.3.7
Maintainers
7

Screen Shot 2023-03-10 at 10 23 33 AM

CleanVision automatically detects potential issues in image datasets like images that are: blurry, under/over-exposed, (near) duplicates, etc. This data-centric AI package is a quick first step for any computer vision project to find problems in the dataset, which you want to address before applying machine learning. CleanVision is super simple -- run the same couple lines of Python code to audit any image dataset!

Read the Docs pypi os py_versions codecov

Installation

pip install cleanvision

Quickstart

Download an example dataset (optional). Or just use any collection of image files you have.

wget -nc 'https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip'
  • Run CleanVision to audit the images.
from cleanvision import Imagelab

# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()
  • CleanVision diagnoses many types of issues, but you can also check for only specific issues.
issue_types = {"dark": {}, "blurry": {}}

imagelab.find_issues(issue_types=issue_types)

# Produce a report with only the specified issue_types
imagelab.report(issue_types=issue_types)

More resources

Clean your data for better Computer Vision

The quality of machine learning models hinges on the quality of the data used to train them, but it is hard to manually identify all of the low-quality data in a big dataset. CleanVision helps you automatically identify common types of data issues lurking in image datasets.

This package currently detects issues in the raw images themselves, making it a useful tool for any computer vision task such as: classification, segmentation, object detection, pose estimation, keypoint detection, generative modeling, etc. To detect issues in the labels of your image data, you can instead use the cleanlab package.

In any collection of image files (most formats supported), CleanVision can detect the following types of issues:

Issue TypeDescriptionIssue KeyExample
1Exact DuplicatesImages that are identical to each otherexact_duplicates
2Near DuplicatesImages that are visually almost identicalnear_duplicates
3BlurryImages where details are fuzzy (out of focus)blurry
4Low InformationImages lacking content (little entropy in pixel values)low_information
5DarkIrregularly dark images (underexposed)dark
6LightIrregularly bright images (overexposed)light
7GrayscaleImages lacking colorgrayscale
8Odd Aspect RatioImages with an unusual aspect ratio (overly skinny/wide)odd_aspect_ratio
9Odd SizeImages that are abnormally large or small compared to the rest of the datasetodd_size

CleanVision supports Linux, macOS, and Windows and runs on Python 3.10+. Learn more from our blog.

Community

Keywords

computer_vision

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts