Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
tree-influence is a python library that implements influence estimation for gradient-boosted decision trees (GBDTs), adapting popular techniques such as TracIn and Influence Functions to GBDTs. This library is compatible with all major GBDT frameworks including LightGBM, XGBoost, CatBoost, and SKLearn.
pip install tree-influence
Simple example using BoostIn to identify the most influential training instances to a given test instance:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from tree_influence.explainers import BoostIn
# load iris data
data = load_iris()
X, y = data['data'], data['target']
# use two classes, then split into train and test
idxs = np.where(y != 2)[0]
X, y = X[idxs], y[idxs]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)
# train GBDT model
model = LGBMClassifier().fit(X_train, y_train)
# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)
# estimate training influences on each test instance
influence = explainer.get_local_influence(X_test, y_test) # shape=(no. train, no. test)
# extract influence values for the first test instance
values = influence[:, 0] # shape=(no. train,)
# sort training examples from:
# - most positively influential (decreases loss of the test instance the most), to
# - most negatively influential (increases loss of the test instance the most)
training_idxs = np.argsort(values)[::-1]
tree-influence supports the following influence-estimation techniques in GBDTs:
Method | Description |
---|---|
BoostIn | Traces the influence of a training instance throughout the training process (adaptation of TracIn). |
TREX | Trains a surrogate kernel model that approximates the original model and decomposes any prediction into a weighted sum of the training examples (adaptation of representer-point methods). |
LeafInfluence | Estimates the impact of a training example on the final GBDT model (adaptation of influence functions). |
TreeSim | Computes influence via similarity in tree-kernel space. |
LOO | Leave-one-out retraining, measures the influence of a training instance by removing and retraining without that instance. |
Brophy, Hammoudeh, and Lowd. Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees. Journal of Machine Learning Research (JMLR), 2023.
@article{brophy2023treeinfluence,
author = {Jonathan Brophy and Zayd Hammoudeh and Daniel Lowd},
title = {Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees},
journal = {Journal of Machine Learning Research},
year = {2023},
volume = {24},
number = {154},
pages = {1--48},
url = {http://jmlr.org/papers/v24/22-0449.html},
}
FAQs
Influence Estimation for Gradient-Boosted Decision Trees
We found that tree-influence demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.