Socket
Socket
Sign inDemoInstall

bencheval

Package Overview
Dependencies
6
Maintainers
0
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    bencheval

Tools for measuring sensitivity and diversity of multi-task benchmarks.


Maintainers
0

Readme

BenchEval is a Python package that provides a suite of tools to evaluate multi-task benchmarks focusing on diversity and sensitivity against irrelevant variations, such as label noise injection and the addition of irrelevant candidate models. This package facilitates comprehensive analysis of multi-task benchmarks through a social choice lens, exposing the fundamental trade-off between diversity and stability in both cardinal and ordinal benchmarks.

For more information, including the motivations behind the measures and our empirical findings, please see our paper.

Quick Start

To install the package, simply run:

pip install bencheval

Example Usage

To evaluate a cardinal benchmark, you can use the following code:

from bencheval.data import load_cardinal_benchmark
from bencheval.measures.cardinal import get_diversity, get_sensitivity

data, cols = load_cardinal_benchmark('GLUE')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To evaluate an ordinal benchmark, you can use the following code:

from bencheval.data import load_ordinal_benchmark
from bencheval.measures.ordinal import get_diversity, get_sensitivity

data, cols = load_ordinal_benchmark('HELM-accuracy')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To use your own benchmark, you just need to provide a pandas DataFrame a list of columns indicating the tasks. Check the documentation for more details.

Reproduce the Paper

Please check out cardinal.ipynb, ordinal.ipynb and banner.ipynb for reproducing our results.

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc