Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days.
If you have Python installed you can install the current release using either pip
:
pip install retriever
or conda
after adding the conda-forge
channel (conda config --add channels conda-forge
):
conda install retriever
Depending on your system configuration this may require sudo
for pip
:
sudo pip install retriever
Precompiled binary installers are also available for Windows, OS X, and Ubuntu/Debian on the releases page. These do not require a Python installation.
To install the Data Retriever from source, you'll need Python 3.6.8+ with the following packages installed:
The following packages are optionally needed to interact with associated database management systems:
Either use pip
to install directly from GitHub:
pip install git+https://git@github.com/weecology/retriever.git
or:
pip install .
. You may need to include sudo
at the beginning of the
command depending on your system (i.e., sudo pip install .
).More extensive documentation for those that are interested in developing can be found here
After installing, run retriever update
to download all of the available dataset scripts.
To see the full list of command line options and datasets run retriever --help
.
The output will look like this:
usage: retriever [-h] [-v] [-q]
{download,install,defaults,update,new,new_json,edit_json,delete_json,ls,citation,reset,help}
...
positional arguments:
{download,install,defaults,update,new,new_json,edit_json,delete_json,ls,citation,reset,help}
sub-command help
download download raw data files for a dataset
install download and install dataset
defaults displays default options
update download updated versions of scripts
new create a new sample retriever script
new_json CLI to create retriever datapackage.json script
edit_json CLI to edit retriever datapackage.json script
delete_json CLI to remove retriever datapackage.json script
ls display a list all available dataset scripts
citation view citation
reset reset retriever: removes configuration settings,
scripts, and cached data
help
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-q, --quiet suppress command-line output
To install datasets, use retriever install
:
usage: retriever install [-h] [--compile] [--debug]
{mysql,postgres,sqlite,msaccess,csv,json,xml} ...
positional arguments:
{mysql,postgres,sqlite,msaccess,csv,json,xml}
engine-specific help
mysql MySQL
postgres PostgreSQL
sqlite SQLite
msaccess Microsoft Access
csv CSV
json JSON
xml XML
optional arguments:
-h, --help show this help message and exit
--compile force re-compile of script before downloading
--debug run in debug mode
These examples are using the Iris flower dataset. More examples can be found in the Data Retriever documentation.
Using Install
retriever install -h (gives install options)
Using specific database engine, retriever install {Engine}
retriever install mysql -h (gives install mysql options)
retriever install mysql --user myuser --password ******** --host localhost --port 8888 --database_name testdbase iris
install data into an sqlite database named iris.db you would use:
retriever install sqlite iris -f iris.db
Using download
retriever download -h (gives you help options)
retriever download iris
retriever download iris --path C:\Users\Documents
Using citation
retriever citation (citation of the retriever engine)
retriever citation iris (citation for the iris data)
Set up Spatial support
To set up spatial support for Postgres using Postgis please refer to the spatial set-up docs.
retriever install postgres harvard-forest # Vector data
retriever install postgres bioclim # Raster data
# Install only the data of USGS elevation in the given extent
retriever install postgres usgs-elevation -b -94.98704597353938 39.027001800158615 -94.3599408119917 40.69577051867074
For more information see the Data Retriever website.
Development of this software was funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.
FAQs
Data Retriever
We found that retriever demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.