unibox
unibox provides unified interface for common file operations.
Quick Start
import unibox as ub
some common use cases of unibox includes:
loading various file types in the same way:
- supports json, txt, images, parquet, csv, feather, ....
- uses appropriate best practices (such as
orjson
package for json) for speed ups
some_dict = ub.loads("some_file.json")
some_list = ub.loads("some_file.txt")
some_img = ub.loads("some_image.jpg")
some_df = ub.loads("some_data.parquet")
saving various python data structure in the same way:
- similar as
ub.loads
but also for saving files
ub.saves(some_dict, "some_file.json")
ub.saves(some_df, "some_df.parquet")
list s3 or local directories in the same way:
- default optional params:
relative_unix=True, debug_print=True
- optimized
s3 ls
speed compared to boto3
files_under_dir = ub.traverses("/home/ubuntu/data")
files_under_s3 = ub.traverses("s3://dataset-pixiv/resized_1572864")
simplified logger class for easier debug:
- a logger with functionalities pre-configured
- includes caller frame info, emoji warnings, datetime, and more
import unibox as ub
logger = ub.UniLogger()
def some_function():
logger.info("some info")
some_function()
resize millions of images efficiently:
- (pre-configured omitted here for simplicity; saves to 98% quality WEBP by default)
- also able to resize by minimum or maximum of side lengths,
target_pixels = int(1024 * 1024 * 1.5)
resizer = ub.UniResizer(root_dir, dst_dir, target_pixels)
images_to_resize = resizer.get_resize_jobs()
resizer.execute_resize_jobs(images_to_resize)
view and label images within jupyter notebook:
import unibox as ub
uris = ["https://cdn.donmai.us/180x180/8e/ea/8eea944690c0c0b27e303420cb1e65bd.jpg"] * 9
labels = ['Image 1', 'Image 2', 'Image 3'] * 3
ub.label_gallery(uris, labels)
Install
install from pypi:
pip install unibox
build from source:
git clone https://github.com/trojblue/unibox
poetry install
poetry build
pip install dist/unibox-<version number>.whl
[OLD DOC] Features
The package is designed to be running with python 3.10, but targets 3.8+ for compatibility:
CLI:
unibox resize <dir>
: resizes a directory of images using either pillow
or libvips
- customizable size / quality / encoding (png / webp / jpeg)
unibox copy <dir>
: an awscli-like tool for copying files with certain suffix to a new dir, keeping the same directory structure.
- bypasses windows explorer so it's much faster.
unibox move <dir>
: like copy
, but moves instead
utils:
UniLogger
: uniformed logger class (logger = unibox.UniLogger()
, and use logger.info(...)
)UniLoader
: uniformed data loader class (unibox.loads(<filename>)
)UniSaver
: uniformed data saver class (unibox.saves(<data>, <filename>)
)UniTraverser
: uniformed directory traverser class, with callbacks in multiple stagesUniResizer
: uniformed image resizer class, with callbacks in multiple stages
callables:
unibox.traverses(dir, include, exclude, relative_unix)
: traverse a directory using specified exclude / include extensions, and return a list of filesunibox.loads(filepath)
: load arbitrary data from a file into suitable formats, with automatic detection of file type
- supported formats: see UniLoader class implementation
unibox.saves(data, filepath)
: saves arbitrary data to a file, with automatic detection of file type