Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Base functions, generation functions and generic wrappers.
© Marcel Robeer, 2021
Module | Description |
---|---|
genbase | Readable data representations and meta information class. |
genbase.data | Wrapper functions for working with data. |
genbase.decorator | Base support for decorators. |
genbase.internationalization | i18n internationalization. |
genbase.mixin | Mixins for seeding (reproducibility) and state machines. |
genbase.model | Wrapper functions for working with machine learning models. |
genbase.ui | Extensible user interfaces (UIs) for genbase dependencies. |
Method | Instructions |
---|---|
pip | Install from PyPI via pip3 install genbase . |
Local | Clone this repository and install via pip3 install -e . or locally run python3 setup.py install . |
genbase
is officially released through PyPI.
See CHANGELOG.md for a full overview of the changes for each version.
genbase
The explabox
aims to support data scientists and machine learning (ML) engineers in explaining, testing and documenting AI/ML models, developed in-house or acquired externally. The explabox turns your ingestibles (AI/ML model and/or dataset) into digestibles (statistics, explanations or sensitivity insights)! The text_explainability
package is available through PyPI and fully documented at https://explabox.readthedocs.io/.
text_explainability
provides a generic architecture from which well-known state-of-the-art explainability approaches for text can be composed. This modular architecture allows components to be swapped out and combined, to quickly develop new types of explainability approaches for (natural language) text, or to improve a plethora of approaches by improving a single module. The text_explainability
package is available through PyPI and fully documented at https://text-explainability.readthedocs.io/.
text_explainability
can be extended to also perform sensitivity testing, checking for machine learning model safety, robustness and fairness. The text_sensitivity
package is available through PyPI and fully documented at https://text-sensitivity.readthedocs.io/.
genbase
Readable data representations and meta information class.
Class | Description |
---|---|
Readable | Ensure that a class has a readable representation. |
Configurable | Adds working with configs (.from_config() , from_json() , from_yaml() , ..., read_json() , ..., to_yaml() ) to a class. |
MetaInfo | Adds type , subtype , callargs and other meta descriptors to a class (subclass of Configurable ). |
silence_tqdm | Silence output of tqdm in a module. |
Examples:
>>> from genbase import MetaInfo
>>> class ReturnCls(MetaInfo):
... def __init__(self, value, **kwargs):
... super().__init__(self,
... type='special_test',
... subtype='special',
... **kwargs)
... self.value = value
...
... @property
... def content(self):
... return {'value': self.value}
>>> obj = ReturnCls(value=5)
>>> obj.to_config()
{'META': {'type': 'special_test',
'subtype': 'special'},
'CONTENT': {'value': 5}}
Silence the output of tqdm
in a with
statement.
>>> import instancelib
>>> from genbase import silence_tqdm
>>> with silence_tqdm(instancelib):
... model.predict(instances)
genbase.data
Wrapper functions for working with data.
Function | Description |
---|---|
import_data() | Import dataset into an instancelib.Environment (containing instances and ground-truth labels). |
train_test_split() | Split a dataset into training and test data. |
Examples: Import from an online .csv file for the BBC News dataset with data in the 'text' column and labels in 'category':
>>> from genbase import import_data
>>> import_data('https://storage.googleapis.com/dataset-uploader/bbc/bbc-text.csv',
... data_cols='text', label_cols='category')
TextEnvironment()
Convert a pandas DataFrame to instancelib Environment:
>>> from genbase import import_data
>>> import pandas as pd
>>> df = pd.read_csv('./Downloads/bbc-text.csv')
>>> import_data(df, data_cols=['text'], label_cols=['category'])
TextEnvironment()
Download a .zip file of the Drugs.com review dataset and convert each file in the ZIP to an instancelib Environment:
>>> from genbase import import_data
>>> import_data('https://archive.ics.uci.edu/ml/machine-learning-databases/00462/drugsCom_raw.zip',
... data_cols='review', label_cols='rating')
TextEnvironment(named_providers=['drugsComTest_raw.tsv', 'drugsComTrain_raw.tsv'])
Convert a huggingface Dataset (SST2 in Glue) to an instancelib Environment:
>>> from genbase import import_data
>>> from datasets import load_dataset
>>> import_data(load_dataset('glue', 'sst2'), data_cols='sentence', label_cols='label')
TextEnvironment(named_providers=['test', 'train', 'validation'])
genbase.decorator
Base support for decorators.
Decorator | Description |
---|---|
@add_callargs | Decorator that passes __callargs__ to a function if available. Useful in conjunction with MetaInfo . |
Example:
>>> from genbase import MetaInfo, add_callargs
>>> class ReturnCls(MetaInfo):
... def __init__(self, value, callargs=None, **kwargs):
... super().__init__(self,
... type='special_test',
... subtype='special',
... callargs=callargs,
... **kwargs)
... self.value = value
...
... @property
... def content(self):
... return {'value': self.value}
>>> @add_callargs
... def example_fn(x: int, y: int, z: int = 5, t='str', **kwargs):
... callargs = kwargs.pop('__callargs__', None)
... return ReturnCls(value=x + y + z, callargs=callargs)
>>> example_fn(x=1, y=2).callargs
{'x': 1, 'y': 2, 'z': 5, 't': 'str'}
genbase.internationalization
i18n
internationalization.
Function | Description |
---|---|
get_locale() | Get current locale. |
set_locale() | Set current locale . |
translate_list() | Get a list based on locale , as defined in the './locale' folder. |
translate_string() | Get a string based on locale , as defined in the './locale' folder. |
Example:
>>> from genbase.internationalization import set_locale, translate_list
>>> set_locale('en')
>>> translate_list('stopwords')
['a', 'an', 'the']
>>> set_locale('nl')
>>> translate_list('stopwords')
['de', 'het', 'een']
genbase.mixin
Mixins for seeding (reproducibility) and state machines.
Class | Description |
---|---|
SeedMixin | Adds working with ._seed and ._original_seed for reproducibility. |
CaseMixin | Adds working with title-, sentence-, upper- and lowercase for random data generation. |
Example:
>>> from genbase.mixin import SeedMixin
>>> class RandomCls(SeedMixin):
... def __init__(self, seed: int = 0):
... self._seed = self._original_seed = seed
>>> rc = RandomCls(seed=10)
>>> rc.seed
10
>>> rc._seed += 20
>>> rc.seed
30
>>> rc._original_seed
10
genbase.model
Wrapper functions for working with machine learning models.
Function | Description |
---|---|
import_data() | Import a model with instancelib or instancelib-onnx. |
Examples: Make a scikit-learn text classifier and train it on SST2
>>> from genbase import import_data, import_model
>>> from datasets import load_dataset
>>> ds = import_data(load_dataset('glue', 'sst2'), data_cols='sentence', label_cols='label')
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> pipeline = Pipeline([('tfidf', TfidfVectorizer()),
... ('clf', MultinomialNB())])
>>> import_model(pipeline, ds, train='train')
SklearnDataClassifier()
Load a pretrained ONNX model with labels 'Bedrijfsnieuws', 'Games' and 'Smartphones'
>>> from genbase import import_model
>>> import_model('data-model.onnx', label_map={0: 'Bedrijfsnieuws', 1: 'Games', 2: 'Smartphones'})
SklearnDataClassifier()
genbase.ui
Extensible user interfaces (UIs) for genbase
dependencies.
Function | Description |
---|---|
get_color() | Get color from a matplotlib colorscale. |
plot.matplotlib_available() | Check if matplotlib is installed. |
plot.plotly_available() | Check if plotly is installed. |
notebook.format_label() | Format label as title. |
notebook.format_instances() | Format multiple instancelib instances. |
notebook.is_colab() | Check if environment is Google Colab. |
notebook.is_interactive() | Check if the environment is interactive (Jupyter Notebook). |
Class | Description |
---|---|
plot.ExpressPlot | Plotter for plotly.express . |
notebook.Render | Base class for rendering configs (configuration dictionaries). |
Example:
>>> from genbase.ui.notebook import Render
>>> class CustomRender(Render):
... def __init__(self, *configs):
... super().__init__(*configs)
... self.default_title = 'My Custom Explanation'
... self.main_color = '#ff00000'
... self.package_link = 'https://git.io/text_explainability'
...
... def format_title(self, title: str, h: str = 'h1', **renderargs) -> str:
... return f'<{h} style="color: red;">{title}</{h}>'
...
... def render_content(self, meta: dict, content: dict, **renderargs):
... type = meta['type'] if 'type' in meta else ''
... return type.replace(' ').title() if 'explanation' in type else type
>>> from genbase import MetaInfo
>>> NiceCls(MetaInfo):
... def __init__(self, **kwargs):
... super().__init__(renderer=CustomRenderer, **kwargs)
FAQs
Generation base dependency
We found that genbase demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.