You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

ucimlrepo

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

ucimlrepo

Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.

0.0.7

PyPI

Maintainers: 1

`ucimlrepo` package

Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
Current Version: 0.0.7

Installation

In a Jupyter notebook, install with the command

!pip3 install -U ucimlrepo

Restart the kernel and import the module ucimlrepo.

Example Usage

from ucimlrepo import fetch_ucirepo, list_available_datasets

# check which datasets can be imported
list_available_datasets()

# import dataset
heart_disease = fetch_ucirepo(id=45)
# alternatively: fetch_ucirepo(name='Heart Disease')

# access data
X = heart_disease.data.features
y = heart_disease.data.targets
# train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)

# access metadata
print(heart_disease.metadata.uci_id)
print(heart_disease.metadata.num_instances)
print(heart_disease.metadata.additional_info.summary)

# access variable info in tabular format
print(heart_disease.variables)

`fetch_ucirepo`

Loads a dataset from the UCI ML Repository, including the dataframes and metadata information.

Parameters

Provide either a dataset ID or name as keyword (named) arguments. Cannot accept both.

id: Dataset ID for UCI ML Repository
name: Dataset name, or substring of name

Returns

dataset
- data: Contains dataset matrices as pandas dataframes
  - ids: Dataframe of ID columns
  - features: Dataframe of feature columns
  - targets: Dataframe of target columns
  - original: Dataframe consisting of all IDs, features, and targets
  - headers: List of all variable names/headers
- metadata: Contains metadata information about the dataset
  - See Metadata section below for details
- variables: Contains variable details presented in a tabular/dataframe format
  - name: Variable name
  - role: Whether the variable is an ID, feature, or target
  - type: Data type e.g. categorical, integer, continuous
  - demographic: Indicates whether the variable represents demographic data
  - description: Short description of variable
  - units: variable units for non-categorical data
  - missing_values: Whether there are missing values in the variable's column

`list_available_datasets`

Prints a list of datasets that can be imported via fetch_ucirepo

Parameters

filter: Optional keyword argument to filter available datasets based on a category
- Valid filters: aim-ahead
search: Optional keyword argument to search datasets whose name contains the search query

Returns

none

Metadata

uci_id: Unique dataset identifier for UCI repository
name
abstract: Short description of dataset
area: Subject area e.g. life science, business
task: Associated machine learning tasks e.g. classification, regression
characteristics: Dataset types e.g. multivariate, sequential
num_instances: Number of rows or samples
num_features: Number of feature columns
feature_types: Data types of features
target_col: Name of target column(s)
index_col: Name of index column(s)
has_missing_values: Whether the dataset contains missing values
missing_values_symbol: Indicates what symbol represents the missing entries (if the dataset has missing values)
year_of_dataset_creation
dataset_doi: DOI registered for dataset that links to UCI repo dataset page
creators: List of dataset creator names
intro_paper: Information about dataset's published introductory paper
repository_url: Link to dataset webpage on the UCI repository
data_url: Link to raw data file
additional_info: Descriptive free text about dataset
- summary: General summary
- purpose: For what purpose was the dataset created?
- funding: Who funded the creation of the dataset?
- instances_represent: What do the instances in this dataset represent?
- recommended_data_splits: Are there recommended data splits?
- sensitive_data: Does the dataset contain data that might be considered sensitive in any way?
- preprocessing_description: Was there any data preprocessing performed?
- variable_info: Additional free text description for variables
- citation: Citation Requests/Acknowledgements
external_url: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI

Links

FAQs

What is ucimlrepo?

Is ucimlrepo well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ucimlrepo

Installation

Example Usage

fetch_ucirepo

Parameters

Returns

list_available_datasets

Parameters

Returns

Metadata

Links

Related posts

AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress

Toptal’s GitHub Organization Hijacked: 10 Malicious Packages Published

`fetch_ucirepo`

`list_available_datasets`