Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Split folders with files (e.g. images) into training, validation and test (dataset) folders.
split-folders
Split folders with files (e.g. images) into train, validation and test (dataset) folders.
The input folder should have the following format:
input/
class1/
img1.jpg
img2.jpg
...
class2/
imgWhatever.jpg
...
...
In order to give you this:
output/
train/
class1/
img1.jpg
...
class2/
imga.jpg
...
val/
class1/
img2.jpg
...
class2/
imgb.jpg
...
test/
class1/
img3.jpg
...
class2/
imgc.jpg
...
This should get you started to do some serious deep learning on your data. Read here why it's a good idea to split your data intro three different sets.
This package is Python only and there are no external dependencies.
pip install split-folders
Optionally, you may install tqdm to get get a progress bar when moving files.
pip install split-folders[full]
You can use split-folders
as Python module or as a Command Line Interface (CLI).
If your datasets is balanced (each class has the same number of samples), choose ratio
otherwise fixed
.
NB: oversampling is turned off by default.
Oversampling is only applied to the train folder since having duplicates in val or test would be considered cheating.
import splitfolders
# Split with a ratio.
# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
splitfolders.ratio("input_folder", output="output",
seed=1337, ratio=(.8, .1, .1), group_prefix=None, move=False) # default values
# Split val/test with a fixed number of items, e.g. `(100, 100)`, for each set.
# To only split into training and validation set, use a single number to `fixed`, i.e., `10`.
# Set 3 values, e.g. `(300, 100, 100)`, to limit the number of training values.
splitfolders.fixed("input_folder", output="output",
seed=1337, fixed=(100, 100), oversample=False, group_prefix=None, move=False) # default values
Occasionally, you may have things that comprise more than a single file (e.g. picture (.png) + annotation (.txt)).
splitfolders
lets you split files into equally-sized groups based on their prefix.
Set group_prefix
to the length of the group (e.g. 2
).
But now all files should be part of groups.
Set move=True
if you want to move the files instead of copying.
Usage:
splitfolders [--output] [--ratio] [--fixed] [--seed] [--oversample] [--group_prefix] [--move] folder_with_images
Options:
--output path to the output folder. defaults to `output`. Get created if non-existent.
--ratio the ratio to split. e.g. for train/val/test `.8 .1 .1 --` or for train/val `.8 .2 --`.
--fixed set the absolute number of items per validation/test set. The remaining items constitute
the training set. e.g. for train/val/test `100 100` or for train/val `100`.
Set 3 values, e.g. `300 100 100`, to limit the number of training values.
--seed set seed value for shuffling the items. defaults to 1337.
--oversample enable oversampling of imbalanced datasets, works only with --fixed.
--group_prefix split files into equally-sized groups based on their prefix
--move move the files instead of copying
Example:
splitfolders --ratio .8 .1 .1 -- folder_with_images
Because of some Python quirks you have to prepend --
afer using --ratio
.
Instead of the command splitfolders
you can also use split_folders
or split-folders
.
Install and use poetry.
If you have a question, found a bug or want to propose a new feature, have a look at the issues page.
Pull requests are especially welcomed when they fix bugs or improve the code quality.
MIT
FAQs
Split folders with files (e.g. images) into training, validation and test (dataset) folders.
We found that split-folders demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.