instancelib

A generic interface for datasets and Machine Learning models

0.5.0
PyPI

Maintainers: 2

A generic interface for datasets and Machine Learning models

instancelib provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.

Quick tour

Load dataset: Load the dataset in an environment

import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
                                  data_cols=["fulltext"],
                                  label_cols=["label"])

ds = text_env.dataset # A `dict-like` interface for instances
labels = text_env.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key  `20`
ins_data = ins.data # Get the raw data for instance 20
ins_vector = ins.vector # Get the vector representation for 20 if any

ins_labels = labels.get_labels(ins)

Dataset manipulation: Divide the dataset in a train and test set

train, test = text_env.train_test_split(ds, train_size=0.70)

print(20 in train) # May be true or false, because of random sampling

Train a model:

from sklearn.pipeline import Pipeline 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer

pipeline = Pipeline([
     ('vect', CountVectorizer()),
     ('tfidf', TfidfTransformer()),
     ('clf', MultinomialNB()),
     ])

model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)

Installation

See installation.md for an extended installation guide.

Method	Instructions
`pip`	Install from PyPI via `pip install instancelib`.
Local	Clone this repository and install via `pip install -e .` or locally run `python setup.py install`.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

@misc{instancelib,
  title = {Python package instancelib},
  author = {Michiel Bron},
  howpublished = {\url{https://github.com/mpbron/instancelib}},
  year = {2021}
}

Library usage

This library is used in the following projects:

python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
text_explainability. A generic explainability architecture for explaining text machine learning models
text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Michiel Bron (@mpbron)

FAQs

What is instancelib?

Is instancelib well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install