MLVisualizationTools
MLVisualizationTools is a python library to make
machine learning more understandable through the
use of effective visualizations.
We support graphing with matplotlib and plotly.
We implicity support all major ML libraries, such as
tensorflow and sklearn.
You can use the built in apps to quickly anaylyze your
existing models, or build custom projects using the modular
sets of functions.
Installation
pip install MLVisualizationTools
Depending on your use case, tensorflow, plotly and matplotlib might need to be
installed.
pip install tensorflow
pip install plotly
pip install matplotlib
To use interactive webapps, use the pip install MLVisualizationTools[dash]
or pip install MLVisualizationTools[dash-notebook]
flags on install.
If you are running on a notebook that doesn't have dash support (like kaggle), you might need
pip install MLVisualizationTools[ngrok-tunneling]
Express
To get started using MLVisualizationTools, run one of the prebuilt apps.
import MLVisualizationTools.express.DashModelVisualizer as App
model = ...
data = ...
App.visualize(model, data)
Functions
MLVisualizationTools connects a variety of smaller functions.
Steps:
- Start with a ML Model and Dataframe with features
- Analyzer
- Interface / Interface Raw (if you don't have a dataframe)
- Colorizers (optional)
- Apply Training Data Points (Optional)
- Colorize data points (Optional)
- Graphs
Analyzers take a ml model and return information about the inputs
such as which ones have high variance.
Interfaces take parameters and construct a multidimensional grid
of values based on plugging these numbers into the model.
(Raw interfaces allow you to use interfaces by specifying column
data instead of a pandas dataframe. Column data is a list with a dict with name, min,
max, and mean values for each feature column)
Colorizers mark points as being certain colors, typically above or below
0.5.
Data Interfaces render training data points on top of the
graph to make it easier to tell if the model trained properly.
Graphs turn these output grids into a visual representation.
Sample
from MLVisualizationTools import Analytics, Interfaces, Graphs, Colorizers, DataInterfaces
model = ...
df = ...
AR = Analytics.analyzeModel(model, df)
maxvar = AR.maxVariance()
grid = Interfaces.predictionGrid(model, maxvar[0], maxvar[1], df)
grid = Colorizers.binary(grid)
grid = DataInterfaces.addPercentageData(grid, df, str('OutputKey'))
fig = Graphs.plotlyGraph(grid)
fig.show()
Prebuilt Examples
Prebuilt examples run off of the pretrained model and dataset
packaged with this library. They include:
- Demo: a basic demo of library functionality that renders 2 plots
- MatplotlibDemo: Demo but with matplotlib instead of plotly
- DashDemo: Non-jupyter notebook version of an interactive dash
website demo
- DashNotebookDemo: Notebook version of an interactive website demo
- DashKaggleDemo: Notebook version of an dash demo that works in kaggle
notebooks
- DataOverlayDemo: Demonstrates data overlay features
See MLVisualizationTools/Examples for more examples.
Use example.main() to run the examples and set parameters such as themes.
Tensorflow Compatibility
MLVisualizationTools is distributed with a pretrained tensorflow model
to make running examples quick and easy. It is not needed for main library functions.
For version 2.0 through 2.4, we load a v2.0 model.
For version 2.5+ we load a v2.5 model.
If this causes compatibility issues you can still use the main library on your models.
If you need an example model, retrain it with
TrainTitanicModel.py
scikit-learn Compatibility
See SklearnDemo.py
Sklearn can be used exactly like TF because it has the same .predict(X) -> Y
interface.
Support for more ML Libraries
We support any ML library that has a predict()
call that takes
a pd Dataframe with features. If this doesn't work, use a wrapper class like
in this example:
import pandas as pd
class ModelWrapper:
def __init(self, model):
self.model = model
def predict(self, dataframe: pd.DataFrame):
...
Remove Feature Testing
See RemoveFeatureDemo.py
Tests if features can be removed from dataset without significantly affecting accuracy.
Replaces each dataset column with mean and compares to baseline accuracy.