soundata
Python library for downloading, loading & working with sound datasets. Check the API documentation and the contributing instructions.
For Music Information Retrieval (MIR) datasets please check mirdata.
This library provides tools for working with common sound datasets, including tools for:
- Downloading datasets to a common location and format
- Validating that the files for a dataset are all present
- Loading annotation files to a common format
- Parsing clip-level metadata for detailed evaluations
Here's soundata's list of currently supported datasets.
Installation
To install, simply run:
pip install soundata
Quick example
import soundata
dataset = soundata.initialize('urbansound8k')
dataset.download()
dataset.validate()
example_clip = dataset.choice_clip()
print(example_clip)
See the documentation for more examples and the API reference.
Contributing a new dataset loader
We welcome and encourage contributions to this library, especially new dataset loaders. Please see contributing for guidelines. Feel free to open an issue if you have any doubt or your run into problems when working on the library.
Citing
TBA
When working with datasets, please cite the version of soundata
that you are using AND include the reference of the dataset, which can be found in the respective dataset loader using the cite()
method.