Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

dataget

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

dataget

A framework-agnostic datasets library for Machine Learning research and education.

  • 0.4.15
  • PyPI
  • Socket score

Maintainers
1

Dataget

Dataget is an easy to use, framework-agnostic, dataset library that gives you quick access to a collection of Machine Learning datasets through a simple API.

Main features:

  • Minimal: Downloads entire datasets with just 1 line of code.
  • Framework Agnostic: Loads data as numpy arrays or pandas dataframes which can be easily used with the majority of Machine Learning frameworks.
  • Transparent: By default stores the data in your current project so you can easily inspect it.
  • Memory Efficient: When a dataset doesn't fit in memory it will return metadata instead so you can iteratively load it.
  • Integrates with Kaggle: Supports loading datasets directly from Kaggle in a variety of formats.

Checkout the documentation for the list of available datasets.

Getting Started

In dataget you just have to do two things:

  • Instantiate a Dataset from our collection.
  • Call the get method to download the data to disk and load it into memory.

Both are usually done in one line:

import dataget


X_train, y_train, X_test, y_test = dataget.image.mnist().get()

This example downloads the MNIST dataset to ./data/image_mnist and loads it as numpy arrays.

Kaggle Support

Kaggle promotes the use of csv files and dataget loves it! With dataget you can quickly download any dataset from the platform and have immediate access to the data:

import dataget

df_train, df_test = dataget.kaggle(dataset="cristiangarcia/pointcloudmnist2d").get(
    files=["train.csv", "test.csv"]
)

To start using Kaggle datasets just make sure you have properly installed and configured the Kaggle API. In the future we want to expand Kaggle support in the following ways:

  • Be able to load any file that numpy or pandas can read.
  • Have generic support for other types of datasets like images, audio, video, etc.
    • e.g dataget.data.kaggle(..., type="image").get(...)

Installation

pip install dataget

Contributing

Adding a new dataset is easy! Read our guide on Creating a Dataset if you are interested in contributing a dataset.

License

MIT License

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc