Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).
Datahugger offers support for more than 377 generic and specific (scientific) repositories (and more to come!).
We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.
Datahugger requires Python 3.6 or later.
pip install datahugger
Load a dataset (or any digital asset) from a repository with the
datahugger.get()
function. The first argument is the DOI or URL,
and the second is the folder name to store the dataset (it will be
created if it does not exist).
The following code loads dataset 10.5061/dryad.mj8m0 into
the folder data
.
import datahugger
# download the dataset to the folder "data"
datahugger.get("10.5061/dryad.mj8m0", "data")
For an example of how this can integrate with your work, see the example workflow notebook or
The command line function datahugger
provides an easy interface to download data. The first
argument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be
created if it does not exist).
datahugger 10.5061/dryad.mj8m0 data
% datahugger 10.5061/dryad.mj8m0 data
Collecting...
NestTemperatureData.csv : 100%|████████████████████████████████████████| 607k/607k
README_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
ExternalTemps.csv : 100%|██████████████████████████████████████| 1.06k/1.06k
README_for_ExternalTemps.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
InternalEggTempData.csv : 100%|██████████████████████████████████████████| 664/664
README_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
SoilSimulation_Output.csv : 100%|████████████████████████████████████████| 229M/229M
README_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k
Dataset successfully downloaded.
Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.mj8m0" data
.
10.5061/dryad.x3ffbg7m8
, doi:10.5061/dryad.x3ffbg7m8
, https://doi.org/10.5061/dryad.x3ffbg7m8
, and https://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7m8
all point to the same dataset.Please feel free to reach out with questions, comments, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.
FAQs
One downloader for many scientific data and code repositories!
We found that datahugger demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.