
Security News
Open Source Maintainers Feeling the Weight of the EU’s Cyber Resilience Act
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
NLPGuard: a Framework for Mitigating the use of Protected Attributes by NLP Classifiers
A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers
This repository contains the code of the paper:
NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers
(Greco et al., 2024)
NLPGuard is a mitigation framework that aims at reducing the use of protected attributes in the predictions of NLP classifiers without sacrificing predictive performance.
It currently supports NLP classifiers trained with the 🤗 HuggingFace library.
NOTE: NLPGuard will be available as a Python package soon
NLPGuard takes an existing NLP classifier, its training dataset, and an unlabeled corpus (e.g., test set or new data) used for predictions as input to produce a mitigated training corpus that significantly reduces the learning that takes place on protected attributes without sacrificing classification accuracy.
NLPGuard comprises three components:
The Explainer leverages XAI techniques to extract the list of most important words used by the model for predictions on the unlabeled corpus. It first computes the words' importance within each prediction (local explanation), and then aggregate them across the entire corpus (global explanation).
NLPGuard currently supports the following XAI techniques:
Other will be added soon:
The Identifier determines which of the most important words are related to protected attributes by annotating each word with one of the following labels:
NLPGuard currently supports the following techniques to annotate protected attributes:
And other techniques will be available soon:
Note:
Warning: The effectiveness of the LlaMa-based annotation has not been evaluated
The Moderator mitigates the protected attributes by modifying the original training dataset that can be used to train a new mitigated classifier.
NLPGuard currently supports the following mitigation strategies:
conda create -n nlpguard-env python=3.8 anaconda
conda activate nlpguard-env
pip install -r requirements.txt
from framework.mitigation_framework import MitigationFramework
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import pandas as pd
id2label = {0: "non-toxic", 1: "toxic"}
model_name_or_path = "your_trained_model_path"
# Load the unlabaled corpus from disk (e.g., test set). Ths is the corpus of texts to explain and extract most important words
df_unalabaled_corpus = pd.read_csv("saved_datasets/unlabeled_corpus.csv")
texts_unlabeled_corpus = df_unalabaled_corpus["text"].tolist()
# Load your model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# Instantiate the Mitigation Framework
mf = MitigationFramework().initialize_mitigation_framework(id2label=id2label,
use_case_name="toxicity-classification")
# Labels to per perform the explanations on (e.g., 0: non-toxic, 1: toxic)
label_ids_to_explain = [0, 1]
# Run the explainer. It returns a dictionary with, for each label id, the list of most important words
output_dict = mf.run_explainer(model, # Explained model
tokenizer, # Model Tokenizer
texts_unlabeled_corpus, # Unlabeled corpus of texts to explain and extract most important words
label_ids_to_explain, # Labels to explain
device="cuda:0" # Device to run the explainer on
)
# Identify protected attributes from the 400 most important words extracted by the explainer for each label
number_most_important_words = 400
#Run the identifier to identify the protected attributes from the most important words extracted by the explainer
df_annotated, protected_attributes, protected_attributes_dict = mf.run_identifier(output_dict, # Output of the explainer
number_most_important_words=number_most_important_words # Number of most important words to consider
)
# Load the training dataset to mitigate
df_train = pd.read_csv("saved_datasets/test.csv")
# If this is True, the protected attributes are mitigated separately for each label, otherwise independently of the label
# For instance, if it is True, the protected attributes identified for the "nurse" label will be used to mitigate only the examples which original label is "nurse" and the same for "non-nurse"
# If is False, the protected attributes identified for all the labels (e.g., "non-nurse" and "nurse" label) will be used to mitigate all the examples, independently of the original label
mitigate_each_label_separately = False
# Select the mitigation strategy to use
mitigation_strategy = "words_removal" # Mitigation strategy to use. It can be "words_removal" or "sentences_removal"
# Run the moderator to mitigate the protected attributes identified by the identifier in the training dataset
df_train_mitigated = mf.run_moderator(df_train, # Training dataset to mitigate
tokenizer, # Model tokenizer
protected_attributes_dict, # Protected attributes identified by the identifier
mitigation_strategy=mitigation_strategy, # Mitigation strategy to use
text_column_name="cleaned_text", # Name of the column containing the texts
label_column_name="label", # Name of the column containing the labels
mitigate_each_label_separately=mitigate_each_label_separately, # Mitigate the protected attributes for each label separately or not
batch_size=128 # Batch size to use for the mitigation
)
# Save the mitigated dataset to disk
df_train_mitigated.to_csv("saved_datasets/test_mitigated.csv", index=False)
Full examples will be available soon.
If you use NLPGuard, please cite the following paper:
@article{10.1145/3686924,
author = {Greco, Salvatore and Zhou, Ke and Capra, Licia and Cerquitelli, Tania and Quercia, Daniele},
title = {NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers},
year = {2024},
issue_date = {November 2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {8},
number = {CSCW2},
url = {https://doi.org/10.1145/3686924},
doi = {10.1145/3686924},
journal = {Proc. ACM Hum.-Comput. Interact.},
month = nov,
articleno = {385},
numpages = {25},
keywords = {bias, crowdsourcing, fairness, large language models, natural language processing, protected attributes, toxic language}
}
FAQs
NLPGuard: a Framework for Mitigating the use of Protected Attributes by NLP Classifiers
We found that nlpguard demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.