New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

dalex

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

dalex

Responsible Machine Learning in Python

1.7.2
PyPI

Maintainers: 2

dalex

dalex: Responsible Machine Learning in Python

Overview

Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.

The dalex package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working. The main Explainer object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of model-level and predict-level explanations. Moreover, there are fairness methods and interactive exploration dashboards available to the user.

The philosophy behind dalex explanations is described in the Explanatory Model Analysis book.

Installation

The dalex package is available on PyPI and conda-forge.

pip install dalex -U

conda install -c conda-forge dalex

One can install optional dependencies for all additional features using pip install dalex[full].

Resources: https://dalex.drwhy.ai/python

API reference: https://dalex.drwhy.ai/python/api

Authors

The authors of the dalex package are:

Hubert Baniecki
Wojciech Kretowicz
Piotr Piatyszek maintains the arena module
Jakub Wisniewski maintains the fairness module
Mateusz Krzyzinski maintains the aspect module
Artur Zolkowski maintains the aspect module
Przemyslaw Biecek

We welcome contributions: start by opening an issue on GitHub.

Citation

If you use dalex, please cite our JMLR paper:

@article{JMLR:v22:20-1473,
  author  = {Hubert Baniecki and
             Wojciech Kretowicz and
             Piotr Piatyszek and 
             Jakub Wisniewski and 
             Przemyslaw Biecek},
  title   = {dalex: Responsible Machine Learning 
             with Interactive Explainability and Fairness in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {214},
  pages   = {1-7},
  url     = {http://jmlr.org/papers/v22/20-1473.html}
}

Changelog

v1.7.2 (2025-02-12)

temporarily restrict the plotly dependency to <6.0.0, fixing compatibility issues with the new version, e.g. titlefont is now title_font (#573 contributed by @lionelkusch)

v1.7.1 (2024-10-02)

numpy>=2.0.0 compatibility: replace instances of x.ptp() with np.ptp(x) and np.Inf with np.inf (#571)
added a way to pass sample_weight to loss functions in model_parts() (variable importance) using weights from dx.Explainer (#563)
fixed the visualization of shap_wrapper for shap==0.45.0

v1.7.0 (2024-02-28)

increase the dependencies to python>=3.8, pandas>=1.5.0, numpy>=1.23.3 and add python==3.11 to CI
added keras.src.models.sequential.Sequential to classes with a known predict_function; it should fix changes in keras==3.0.0 and tensorflow==2.16.0
turn off verbose in the predict method of tensorflow/keras models that changed in tensorflow>=2.9.0
update the warning occurring when specifying variable_splits (#558)
fix an error occuring in predict_profile() when a DataFrame has MultiIndex in pandas>=1.3.0 (#550)
fix gaussian norm() calculation in model_profile() from pi*sqrt(2) to sqrt(2*pi)
fix a warning (future error) between prepare_numerical_categorical() and prepare_x() with pandas==2.1.0
fix a warning (future error) concerning the default value of numeric_only in pandas.DataFrame.corr() in dalex.aspect.calculate_assoc_matrix()

v1.6.0 (2023-02-16)

add ZeroDivisionError to precision and recall functions (#532)
add a warning to calculate_depend_matrix() when there is a variable with only one value (#537)
fix missing EDA plots in (Python) Arena (#544)
fix baseline positions in the subplots of the predict parts explanations: BreakDown, Shap (#545)

v1.5.0 (2022-09-07)

This release consists of mostly maintenance updates and, after a year, marks the Beta -> Stable release.

increase the dependency from python>=3.6 to python>=3.7 (at this moment, both numpy and pandas depend on python>=3.8), and add python==3.10 to CI
increase the dependencies to pandas>=1.2.5, numpy>=1.20.3 (#526), scipy>=1.6.3, plotly>=5.1.0, and tqdm>=4.61.2 due to errors with pandas (see tqdm/#1199)
remove the use of pd.Series.append() (#489)
remove the use of np.isnan causing error in dalex.fairness (#491)
fix iBreakDown plot y-axis labels (#493)
stop the Arena's werkzeug server using a clearner and still supported API (#518)

v1.4.1 (2021-11-08)

features

added fairness plot for regression models to Arena (dalex/#408)
added new facet_scales parameter to AP.plot and CP.plot, which allows to free the y-axis with facet_scales="free" (dalex/#469); consistent with R (DALEX/#468, ingredients/#140)

fixes

fixed AP and CP progress bars

v1.4.0 (2021-09-09)

added new aspect module, which will focus on groups of dependent variables @krzyzinskim & @arturzolkowski
added new scipy>=1.5.4 dependency

breaking changes

improved the calculation of AUC, ROC plot (#459)

fixes

wrong yaxis labels in VariableImportance.plot(split="variable") (#451)
repr_html() didn't work for explanation objects before using the fit method (#449)

features

added new Aspect object with the predict_triplot, model_triplot, predict_parts, model_parts, get_aspects methods
added new PredictTriplot, ModelTriplot, PredictAspectImportance, ModelAspectImportance objects with the plot method

v1.3.0 (2021-07-17)

features

added bias mitigation techniques (resample, reweight, roc_pivot) into the fairness module (#432)

v1.2.0 (2021-05-31)

breaking changes

method set_options in Arena now takies option_category instead of plot_type (SHAPValues => ShapleyValues, FeatureImportance => VariableImportance) (#420)
methods using the N parameter now properly sample rows from data

fixes

fixed wrong error value when no predict_function is found in Explainer (77ca90d)
set multiprocessing context to 'spawn' (#412)
fixed bug in metric_scores plot that made only one subgroup appear on y-axis (#416)
added support for older keras models (#415)

features

added a resource mechanism to Arena (#419)
added ShapleyValuesImportance and ShapleyValuesDependence plots to Arena (#420)
return error instead of NaN when AUC is calculated on observations from one class only (#415)

v1.1.0 (2021-04-18)

breaking changes

fixed concurrent random seeds when processes > 1 (#392), which means that the results of parallel computation will vary between v1.1.0 and previous versions

fixes

GroupFairnessX.plot(type='fairness_check') generates ticks according to the x-axis range (#409)
GroupFainressRegression.plot(type='density') has a more readable hover - only for outliers (#409)
BreakDown.plot() wrongly displayed the "+all factors" bar when max_vars < p (#401)
GroupFairnessClassification.plot(type='metric_scores') did not handle NaN's (#399)

features

Experimental support for regression models in the fairness module. Added GroupFairnessRegression object, with the plot method having two types: fairness_check and density. Explainer.model_fairness method now depends on the model_type attribute. (#391)
added N parameter to the predict_parts method which is None by default (#402)
epsilon is now an argument of the GroupFairnessClassification object (#397)

v1.0.1 (2021-02-19)

fixes

fixed broken range on yaxis in fairness_check plot (#376)
warnings because np.float is depracated since numpy v1.20 (#384)

other

added ipython to test dependencies

v1.0.0 (2020-12-29)

breaking changes

These are summed up in (#368):

rename modules: dataset_level into model_explanations, instance_level into predict_explanations, _arena module into arena
use __dir__ method to define autocompletion in IPython environment - show only ['Explainer', 'Arena', 'fairness', 'datasets']
add plot method and result attribute to LimeExplanation (use lime.explanation.Explanation.as_pyplot_figure() and lime.explanation.Explanation.as_list())
CeterisParibus.plot(variable_type='categorical') now has horizontal barplots - horizontal_spacing=None by default (varies on variable_type). Also, once again added the "dot" for observation value.
predict_fn in predict_surrogate now uses predict_function (trying to make it work for more frameworks)

fixes

fixed wrong verbose output when any value in y_hat/residuals was an int not float
added proper "-" sign to negative dropout losses in VariableImportance.plot

features

added geom='bars' to AggregateProfiles.plot to force the categorical plot
added geom='roc' and geom='lift' to ModelPerformance.plot
added Fairness plot to Arena

other

remove colorize from Explainer
updated the documentation, refactored code (import modules not functions, unify variable names in object.py, move utils funcitons from checks.py to utils.py, etc.)
added license notice next to data

v0.4.1 (2020-12-03)

added support for h2o.estimators.* (#332)
added tensorflow.python.keras.engine.functional.Functional to the tensorflow list
updated the plotly dependency to >=4.12.0
code maintenance: yhat, check_data

fixes

fixed check_if_empty_fields() used in loading the Explainer from a pickle file, since several checks were changed
fixed plot() method in GroupFairnessClassification as it omitted plotting a metric when NaN was present in metric ratios (result)
fixed dragons and HR datasets having , delimeter instead of ., which transformed numerical columns into categorical.
fixed representation of the ShapWrapper class (removed _repr_html_ method)

features

allow for y to be a pandas.DataFrame (converted)
allow for data, y to be a H2OFrame (converted)
added label parameter to all the relevant dx.Explainer methods, which overrides the default label in explanation's result
now using GradientExplainer for tf.keras.engine.sequential.Sequential, added proper warning when shap_explainer_type is None (#366)

defaults

unify verbose output of Explainer

v0.4.0 (2020-11-17)

added new arena module, which adds the backend for Arena dashboard @piotrpiatyszek

features

added new aliases to dx.Explainer methods (#350) in model_parts it is {'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}, in model_profile it is {'pdp': 'partial', 'ale': 'accumulated'}
added Arena object for dashboard backend. See https://github.com/ModelOriented/Arena
new fairness plot types: stacked, radar, performance_and_fairness, heatmap, ceteris_paribus_cutoff
upgraded fairness_check()

v0.3.0 (2020-10-26)

added new fairness module, which will focus on bias detection, visualization and mitigation @jakwisn

fixes

removed unnecessary warning when precalculate=False and verbose=False (#340)

features

added model_fairness method to the Explainer, which performs fairness explanation
added GroupFairnessClassification object, with the plot method having two types: fairness_check and metric_scores

defaults

added the N=50000 argument to ResidualDiagnostics.plot, which samples observations from the result parameter to omit performance issues when smooth=True (#341)

v0.2.2 (2020-09-21)

added support for tensorflow.python.keras.engine.sequential.Sequential and tensorflow.python.keras.engine.training.Model (#326)
updated the tqdm dependency to >=4.48.2, pandas dependency to >=1.1.2 and numpy dependency to >=1.18.4

fixes

fixed the wrong order of Explainer verbose messages
fixed a bug that caused model_info parameter to be overwritten by the default values
fixed a bug occurring when the variable from groups was not of str type (#327)
fixed model_profile: variable_type='categorical' not working when user passed variables parameter (#329) + the reverse order of bars in 'categorical' plots + (again) added variable_splits_type parameter to model_profile to specify how grid points shall be calculated (#266) + allow for both 'quantile' and 'quantiles' types (alias)

features

added informative error messages when importing optional dependencies (#316)
allow for data and y to be None - added checks in Explainer methods

defaults

wrong parameter name title_x changed to y_title in CeterisParibus.plot and AggregatedProfiles.plot (#317)
now warning the user in Explainer when predict_function returns an error or doesn't return numpy.ndarray (1d) (#325)

v0.2.1 (2020-08-31)

updated the pandas dependency to >=1.1.0

fixes

ModelPerformance.plot now uses a drwhy color palette
use unique method instead of np.unique in variable_splits (#293)
v0.2.0 didn't export new datasets
fixed a bug where predict_parts(type='shap') calculated wrong contributions (#300)
model_profile uses observation mean instead of profile mean in _yhat_ centering
fixed barplot baseline in categorical model_profile and predict_profile plots (#297)
fixed model_profile(type='accumulated') giving wrong results (#302)
vertical/horizontal lines in plots now end on the plot edges

features

added new type='shap_wrapper' to predict_parts and model_parts methods, which returns a new ShapWrapper object. It contains the main result attribute (shapley_values) and the plot method (force_plot and summary_plot respectively). These come from the shap package
Explainer.predict method now accepts numpy.ndarray
added the ResidualDiagnostics object with a plot method
added model_diagnostics method to the Explainer, which performs residual diagnostics
added predict_surrogate method to the Explainer, which is a wrapper for the lime tabular explanation from the lime package
added model_surrogate method to the Explainer, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package
added a _repr_html_ method to all of the explanation objects (it prints the result attribute)
added dalex.__version__
added informative error messages in Explainer methods when y is of wrong type (#294)
CeterisParibus.plot(variable_type='categorical') now allows for multiple observations
new verbose checks for model_type
add type to model_info in dump and dumps for R compatibility (#303)
ModelPerformance.result now has label as index

defaults

removed _grid_ column in AggregatedProfiles.result and center only works with type=accumulated
use Pipeline._final_estimator to extract model_class of the actual model
use model._estimator_type to extract model_type if possible

v0.2.0 (2020-08-07)

major documentation update (#270)
unified the order of function parameters

fixes

v0.1.9 had wrong _original_ column in predict_profile
vertical_spacing acts as intended in VariableImportance.plot when split='variable'
loss_function='auc' now uses loss_one_minus_auc as this should be a descending measure
plots are now saved with the original height and width
model_profile now properly passes the variables parameter to CeterisParibus
variables parameter in predict_profile now can also be a string

features

use px.express instead of core plotly to make model_profile and predict_profile plots; thus, enhance performance and scalability
added verbose parameter where tqdm is used to verbose progress bar
added loss_one_minus_auc function that can be used with loss_function='1-auc' in model_parts
added new example data sets: apartments, dragons and hr
added color, opacity, title_x parameters to model_profile and predict_profile plots (#236), changed tooltips and legends (#262)
added geom='profiles' parameter to model_profile plot and raw_profiles attribute to AggregatedProfiles
added variable_splits_type parameter to predict_profile to specify how grid points shall be calculated (#266)
added variable_splits_with_obs parameter to predict_profile function to extend split points with observation variable values (#269)
added variable_splits parameter to model_profile

defaults

use different loss_function for classification and regression (#248)
models that use proba yhats now get model_type='classification' if it's not specified
use uniform way of grid points calculation in predict_profile and model_profile (see variable_splits_type parameter)
add the variable values of new_observation to variable_splits in predict_profile (see variable_splits_with_obs parameter)
use N=1000 in model_parts and N=300 in model_profile to comply with the R version
keep_raw_permutation is now set to False instead of None in model_parts
intercept parameter in model_profile is now named center

v0.1.9 (2020-07-01)

feature: added random_state parameter for predict_parts(type='shap') and model_profile for reproducible calculations
fix: fixed random_state parameter in model_parts
feature: multiprocessing added for: model_profile, model_parts, predict_profile and predict_parts(type='shap'), through the processes parameter
fix: significantly improved the speed of accumulated and conditional types in model_profile
bugfix: use pd.api.types.is_numeric_dtype() instead of np.issubdtype() to cover more types; e.g. it caused errors with string type
defaults: use pd.convert_dtypes() on the result of CeterisParibus to fix variable dtypes and later allow for a concatenation without the dtype conversion
fix: variables parameter now can be a single str value
fix: number rounding in predict_parts, model_parts (#245)
fix: CP calculations for models that take only variables as an input

v0.1.8 (2020-05-28)

bugfix: variable_splits parameter now works correctly in predict_profile
bugfix: fix baseline for 3+ models in AggregatedProfiles.plot (#234)
printing: now rounding numbers in Explainer messages
fix: minor checks fixes in instance_level
bugfix: AggregatedProfiles.plot now works with groups

v0.1.7 (2020-05-10)

feature: parameter N in model_profile can be set to None, to select all observations
input: groups and variable parameters in model_profile can be: str, list, numpy.ndarray, pandas.Series
fix: check_label returned only a first letter
bugfix: removed the conversion of all_variables to str in prepare_all_variables, which caused an error in model_profile (#214)
defaults: change numpy data variable names from numbers to strings

v0.1.6 (2020-04-30)

fix: change short_name encoding in fifa dataset (utf8->ascii)
fix: remove scipy dependency
defaults: default loss_root_mean_square in model parts changed to rmse
bugfix: checks related to new_observation in BreakDown, Shap, CeterisParibus now work for multiple inputs (#207)
bugfix: CeterisParibus.fit and CeterisParibus.plot now work for more types of new_observation.index, but won't work for a bolean type (#211)

v0.1.5 (2020-04-21)

feature: add xgboost package compatibility (#188)
feature: added model_class parameter to Explainer to handle wrapped models
feature: Exaplainer attribute model_info remembers if parameters are default
bugfix: variable_groups parameter now works correctly in model_parts
fix: changed parameter order in Explainer: model_type, model_info, colorize
documentation: model_parts documentation is updated
feature: new show parameter in plot methods that (if False) returns plotly Figure (#190)
feature: load_fifa() function which loads the preprocessed players_20 dataset
fix: CeterisParibus.plot tooltip

v0.1.4 (2020-04-14)

feature: new Explainer.residual method which uses residual_function to calculate residuals
feature: new dump and dumps methods for saving Explainer in a binary form; load and loads methods for loading Explainer from binary form
fix: Explainer constructor verbose text
bugfix: B:=B+1 - Shap now stores average results as B=0 and path results as B=1,2,...
bugfix: Explainer.model_performance method uses self.model_type when model_type is None
bugfix: values in BreakDown and Shap are now rounded to 4 significant places (#180)
bugfix: Shap by default uses path='average', sign column is properly updated and bars in plot are sorted by abs(contribution)

v0.1.3 (2020-04-10)

release of the dalex package
Explainer object with predict, predict_parts, predict_profile, model_performance, model_parts and model_profile methods
BreakDown, Shap, CeterisParibus, ModelPerformance, VariableImportance and AggregatedProfiles objects with a plot method
load_titanic() function which loads the titanic_imputed dataset

FAQs

What is dalex?

Is dalex well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

dalex

dalex

Overview

Installation

Resources: https://dalex.drwhy.ai/python

API reference: https://dalex.drwhy.ai/python/api

Authors

Citation

Changelog

v1.7.2 (2025-02-12)

v1.7.1 (2024-10-02)

v1.7.0 (2024-02-28)

v1.6.0 (2023-02-16)

v1.5.0 (2022-09-07)

v1.4.1 (2021-11-08)

features

fixes

v1.4.0 (2021-09-09)

breaking changes

fixes

features

v1.3.0 (2021-07-17)

features

v1.2.0 (2021-05-31)

breaking changes

fixes

features

v1.1.0 (2021-04-18)

breaking changes

fixes

features

v1.0.1 (2021-02-19)

fixes

other

v1.0.0 (2020-12-29)

breaking changes

fixes

features

other

v0.4.1 (2020-12-03)

fixes

features

defaults

v0.4.0 (2020-11-17)

features

v0.3.0 (2020-10-26)

fixes

features

defaults

v0.2.2 (2020-09-21)

fixes

features

defaults

v0.2.1 (2020-08-31)

fixes

features

defaults

v0.2.0 (2020-08-07)

fixes

features

defaults

v0.1.9 (2020-07-01)

v0.1.8 (2020-05-28)

v0.1.7 (2020-05-10)

v0.1.6 (2020-04-30)

v0.1.5 (2020-04-21)

v0.1.4 (2020-04-14)

v0.1.3 (2020-04-10)

Related posts

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain

Michigan TypeScript Founder Successfully Runs Doom Inside TypeScript's Type System