A generic interface for datasets and Machine Learning models
instancelib
provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.
© Michiel Bron, 2021
Quick tour
Load dataset: Load the dataset in an environment
import instancelib as il
text_env = il.read_excel_dataset("./datasets/testdataset.xlsx",
data_cols=["fulltext"],
label_cols=["label"])
ds = text_env.dataset
labels = text_env.labels
labelset = labels.labelset
ins = ds[20]
ins_data = ins.data
ins_vector = ins.vector
ins_labels = labels.get_labels(ins)
Dataset manipulation: Divide the dataset in a train and test set
train, test = text_env.train_test_split(ds, train_size=0.70)
print(20 in train)
Train a model:
from sklearn.pipeline import Pipeline
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB()),
])
model = il.SkLearnDataClassifier.build(pipeline, text_env)
model.fit_provider(train, labels)
predictions = model.predict(test)
Installation
See installation.md for an extended installation guide.
Method | Instructions |
---|
pip | Install from PyPI via pip install instancelib . |
Local | Clone this repository and install via pip install -e . or locally run python setup.py install . |
Documentation
Full documentation of the latest version is provided at https://instancelib.readthedocs.org.
Example usage
See usage.py to see an example of how the package can be used.
Releases
instancelib
is officially released through PyPI.
See CHANGELOG.md for a full overview of the changes for each version.
Citation
@misc{instancelib,
title = {Python package instancelib},
author = {Michiel Bron},
howpublished = {\url{https://github.com/mpbron/instancelib}},
year = {2021}
}
Library usage
This library is used in the following projects:
- python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems.
- text_explainability. A generic explainability architecture for explaining text machine learning models
- text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.
Maintenance
Contributors