Documentation (latest) •
Documentation (main branch) •
Contributing •
Contact
Part of the Fatiando a Terra project
About
Just want to download a file without messing with requests
and urllib
?
Trying to add sample datasets to your Python package?
Pooch is here to help!
Pooch is a Python library that can manage data by downloading files
from a server (only when needed) and storing them locally in a data cache
(a folder on your computer).
- Pure Python and minimal dependencies.
- Download files over HTTP, FTP, and from data repositories like Zenodo and figshare.
- Built-in post-processors to unzip/decompress the data after download.
- Designed to be extended: create custom downloaders and post-processors.
Are you a scientist or researcher? Pooch can help you too!
- Host your data on a repository and download using the DOI.
- Automatically download data using code instead of telling colleagues to do it themselves.
- Make sure everyone running the code has the same version of the data files.
Projects using Pooch
SciPy,
scikit-image,
xarray,
Ensaio,
GemPy,
MetPy,
napari,
Satpy,
yt,
PyVista,
icepack,
histolab,
seaborn-image,
Open AR-Sandbox,
climlab,
mne-python,
GemGIS,
SHTOOLS,
MOABB,
GeoViews,
ScopeSim,
Brainrender,
pyxem,
cellfinder,
PVGeo,
geosnap,
BioCypher,
cf-xarray,
Scirpy,
rembg,
DASCore,
scikit-mobility,
Py-ART,
HyperSpy,
RosettaSciIO,
eXSpy
If you're using Pooch, send us a pull request adding your project to the list.
Example
For a scientist downloading a data file for analysis:
import pooch
import pandas as pd
fname_bathymetry = pooch.retrieve(
url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
)
fname_gravity = pooch.retrieve(
url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
)
data_bathymetry = pd.read_csv(fname_bathymetry)
data_gravity = pd.read_csv(fname_gravity)
For package developers including sample data in their projects:
"""
Module mypackage/datasets.py
"""
import pkg_resources
import pandas
import pooch
from . import version
GOODBOY = pooch.create(
path=pooch.os_cache("mypackage"),
base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
version=version,
version_dev="main",
env="MYPACKAGE_DATA_DIR",
registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
)
GOODBOY.load_registry(
pkg_resources.resource_stream("mypackage", "registry.txt")
)
def fetch_gravity_data():
"""
Load some sample gravity data to use in your docs.
"""
fname = GOODBOY.fetch("gravity-data.csv")
data = pandas.read_csv(fname)
return data
Getting involved
🗨️ Contact us:
Find out more about how to reach us at
fatiando.org/contact.
👩🏾💻 Contributing to project development:
Please read our
Contributing Guide
to see how you can help and give feedback.
🧑🏾🤝🧑🏼 Code of conduct:
This project is released with a
Code of Conduct.
By participating in this project you agree to abide by its terms.
Imposter syndrome disclaimer:
We want your help. No, really. There may be a little voice inside your
head that is telling you that you're not ready, that you aren't skilled
enough to contribute. We assure you that the little voice in your head is
wrong. Most importantly, there are many valuable ways to contribute besides
writing code.
This disclaimer was adapted from the
MetPy project.
License
This is free software: you can redistribute it and/or modify it under the terms
of the BSD 3-clause License. A copy of this license is provided in
LICENSE.txt
.