
Security News
NVD Quietly Sweeps 100K+ CVEs Into a “Deferred” Black Hole
NVD now marks all pre-2018 CVEs as "Deferred," signaling it will no longer enrich older vulnerabilities, further eroding trust in its data.
ACV is a library that provides robust and accurate explanations for machine learning models or data
ACV is a python library that aims to explain any machine learning models or data.
We can regroup the different explanations in two groups: Agnostic Explanations and Tree-based Explanations.
See the papers here.
Python 3.6+
OSX: ACV uses Cython extensions that need to be compiled with multi-threading support enabled. The default Apple Clang compiler does not support OpenMP. To solve this issue, obtain the lastest gcc version with Homebrew that has multi-threading enabled: see for example pysteps installation for OSX.
Windows: Install MinGW (a Windows distribution of gcc) or Microsoft’s Visual C
Install the acv package:
$ pip install acv-exp
The Agnostic approaches explain any data (X, Y) or model (X, f(X)) using the following explanation methods:
See the paper Consistent Sufficient Explanations and Minimal Local Rules for explaining regression and classification models for more details.
I. First, we need to fit our explainer (ACXplainers) to input-output of the data (X, Y) or model (X, f(X)) if we want to explain the data or the model respectively.
from acv_explainers import ACXplainer
# It has the same params as a Random Forest, and it should be tuned to maximize the performance.
acv_xplainer = ACXplainer(classifier=True, n_estimators=50, max_depth=5)
acv_xplainer.fit(X_train, y_train)
roc = roc_auc_score(acv_xplainer.predict(X_test), y_test)
II. Then, we can load all the explanations in a webApp as follow:
import acv_app
import os
# compile the ACXplainer
acv_app.compile_ACXplainers(acv_xplainer, X_train, y_train, X_test, y_test, path=os.getcwd())
# Launch the webApp
acv_app.run_webapp(pickle_path=os.getcwd())
III. Or we can compute each explanation separately as follow:
The main tool of our explanations is the Same Decision Probability (SDP). Given , the same decision probability
of variables
is the probabilty that the prediction remains the same when we fixed variables
or when the variables
are missing.
sdp = acv_xplainer.compute_sdp_rf(X, S, data_bground) # data_bground is the background dataset that is used for the estimation. It should be the training samples.
The Sufficient Explanations is the Minimal Subset S such that fixing the values
permit to maintain the prediction with high probability
.
See the paper here for more details.
How to compute the Minimal Sufficient Explanation ?
The following code return the Sufficient Explanation with minimal cardinality.
sdp_importance, min_sufficient_expl, size, sdp = acv_xplainer.importance_sdp_rf(X, y, X_train, y_train, pi_level=0.9)
How to compute all the Sufficient Explanations ?
Since the Minimal Sufficient Explanation may not be unique for a given instance, we can compute all of them.
sufficient_expl, sdp_expl, sdp_global = acv_xplainer.sufficient_expl_rf(X, y, X_train, y_train, pi_level=0.9)
For a given instance, the local explanatory importance of each variable corresponds to the frequency of apparition of the given variable in the Sufficient Explanations. See the paper here for more details.
lximp = acv_xplainer.compute_local_sdp(d=X_train.shape[1], sufficient_expl)
For a given instance (x, y) and its Sufficient Explanation S such that , we compute a local minimal rule which contains x such
that every observation z that satisfies this rule has
. See the paper here for more details
sdp, rules, _, _, _ = acv_xplainer.compute_sdp_maxrules(X, y, data_bground, y_bground, S) # data_bground is the background dataset that is used for the estimation. It should be the training samples.
ACV gives Shapley Values explanations for XGBoost, LightGBM, CatBoostClassifier, scikit-learn and pyspark tree models. It provides the following Shapley Values:
In addition, we use the coalitional version of SV to properly handle categorical variables in the computation of SV.
See the papers here
To explain the tree-based models above, we need to transform our model into ACVTree.
from acv_explainers import ACVTree
forest = XGBClassifier() # or any Tree Based models
#...trained the model
acvtree = ACVTree(forest, data_bground) # data_bground is the background dataset that is used for the estimation. It should be the training samples.
sv = acvtree.shap_values(X)
Note that it provides a better estimation of the tree-path dependent of TreeSHAP when the variables are dependent.
Let assume we have a categorical variable Y with k modalities that we encoded by introducing the dummy variables . As shown in the paper, we must take the coalition of the dummy variables to correctly compute the Shapley values.
# cat_index := list[list[int]] that contains the column indices of the dummies or one-hot variables grouped
# together for each variable. For example, if we have only 2 categorical variables Y, Z
# transformed into [Y_0, Y_1, Y_2] and [Z_0, Z_1, Z_2]
cat_index = [[0, 1, 2], [3, 4, 5]]
forest_sv = acvtree.shap_values(X, C=cat_index)
In addition, we can compute the SV given any coalitions. For example, let assume we have 10 variables
and we want the following coalition
coalition = [[0, 1, 2], [3, 4], [5, 6]]
forest_sv = acvtree.shap_values(X, C=coalition)
Recall that the is the probability that the prediction remains the same when we fixed variables
given the subset S.
sdp = acvtree.compute_sdp_clf(X, S, data_bground) # data_bground is the background dataset that is used for the estimation. It should be the training samples.
Recall that the Minimal Sufficient Explanations is the Minimal Subset S such that fixing the values
permit to maintain the prediction with high probability
.
sdp_importance, sdp_index, size, sdp = acvtree.importance_sdp_clf(X, data_bground) # data_bground is the background dataset that is used for the estimation. It should be the training samples.
The Active Shapley values is a SV based on a new game defined in the Paper (Accurate and robust Shapley Values for explaining predictions and focusing on local important variables such that null (non-important) variables has zero SV and the "payout" is fairly distribute among active variables.
import acv_explainers
# First, we need to compute the Active and Null coalition
sdp_importance, sdp_index, size, sdp = acvtree.importance_sdp_clf(X, data_bground)
S_star, N_star = acv_explainers.utils.get_active_null_coalition_list(sdp_index, size)
# Then, we used the active coalition found to compute the Active Shapley values.
forest_asv_adap = acvtree.shap_values_acv_adap(X, C, S_star, N_star, size)
If you don't want to use multi-threaded (due to scaling or memory problem), you have to add "_nopa" to each function (e.g. compute_sdp_clf ==> compute_sdp_clf_nopa). You can also compute the different values needed in cache by setting cache=True in ACVTree initialization e.g. ACVTree(model, data_bground, cache=True).
We can find a tutorial of the usages of ACV in demo_acv and the notebooks below demonstrate different use cases for ACV. Look inside the notebook directory of the repository if you want to try playing with the original notebooks yourself.
FAQs
ACV is a library that provides robust and accurate explanations for machine learning models or data
We found that acv-exp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
NVD now marks all pre-2018 CVEs as "Deferred," signaling it will no longer enrich older vulnerabilities, further eroding trust in its data.
Research
Security News
Lazarus-linked threat actors expand their npm malware campaign with new RAT loaders, hex obfuscation, and over 5,600 downloads across 11 packages.
Security News
Safari 18.4 adds support for Iterator Helpers and two other TC39 JavaScript features, bringing full cross-browser coverage to key parts of the ECMAScript spec.