
Product
Socket Now Supports pylock.toml Files
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
A curated collection of datasets for data analysis & machine learning, downloadable with a single Python command
opendatasets
is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.
Install the library using pip
:
pip install opendatasets --upgrade
Datasets can be downloaded within a Jupyter notebook or Python script using the opendatasets.download
helper function. Here's some sample code for downloading the US Elections Dataset:
import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')
dataset_url
can also point to a public Google Drive link or a raw file URL.
opendatasets
uses the Kaggle Official API for donwloading dataset from Kaggle. Follow these steps to find your API credentials:
Sign in to https://kaggle.com/, then click on your profile picture on the top right and select "My Account" from the menu.
Scroll down to the "API" section and click "Create New API Token". This will download a file kaggle.json
with the following contents:
{"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}
opendatsets.download
, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded in step 2.Note that you need to download the kaggle.json
file only once. You can also place the kaggle.json
file in the same directory as the Jupyter notebook, and the credentials will be read automatically.
You can find interesting datasets on Kaggle: https://www.kaggle.com/datasets
You can also create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable)
Other sources to look for datasets:
If you use an external source other than Kaggle, you'll create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable using opendatasets
)
opendatasets
also provides some curated datsets that you can download by passing the Dataset ID to opendatasets.download
. Here's an example:
import opendatasets
opendatasets.download('stackoverflow-developer-survey-2020')
The following datasets are available for download.
Dataset ID | Description | Source |
---|---|---|
stackoverflow-developer-survey-2020 | Stack Overflow Developer Survey 2020 | Stack Overflow |
owid-covid-19-latest | Covid-19 Stats by Our World in Data | Our World in Data |
state-of-javascript-2016 | State of Javascript Annual Survey 2016 | StateOfJS |
state-of-javascript-2017 | State of Javascript Annual Survey 2017 | StateOfJS |
state-of-javascript-2018 | State of Javascript Annual Survey 2018 | StateOfJS |
state-of-javascript-2019 | State of Javascript Annual Survey 2019 | StateOfJS |
countries-languages-spoken | Languages Spoken in Different Countries | Infoplease |
More datasets will be added soon..
This is an open source project and we welcome contributions.
git clone https://github.com/JovianML/opendatasets.git
conda create -n opendatasets python=3.5
conda activate opendatasets
pip install -r requirements.txt
opendatasets
conda environment.This package is developed and maintained by the Jovian team.
FAQs
A curated collection of datasets for data analysis & machine learning, downloadable with a single Python command
We found that opendatasets demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.
Research
Security News
Malicious Ruby gems typosquat Fastlane plugins to steal Telegram bot tokens, messages, and files, exploiting demand after Vietnam’s Telegram ban.