Benchmark suite for Autoregressive Neural Emulators of PDEs in JAX.
OpenMMLab Image and Video Editing Toolbox and Benchmark
A framework for benchmarking causal inference.
Benchmarking library for generative algorithms
A Flexible Framework for Accelerating LLMs Benchmarking
A library for benchmarking AI models
SyntheRela - Synthetic Relational Data Generation Benchmark
mcsm-benchs: A benchmarking toolbox for Multi-Component Signal Methods.
Rotation Detection Toolbox and Benchmark
TabularBench: Adversarial robustness benchmark for tabular data
Fuzzy Data Benchmark
GitLab Security Compliance CLI
Code to benchmark and evaluate MLLM on industrial data (data itself are not part of the publication).
Benchmark PDF extraction tools for use with RAG applications
The companion code for the FanOutQA dataset + benchmark for LLMs.
MESS – Multi-domain Evaluation of Semantic Segmentation
GuardBench: A Large-Scale Benchmark for Guardrail Models
Table Retrieval for Generative Tasks Benchmark
ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction
Advanced Disk Benchmark Tool
A simple benchmarking library
I/O profiler for deep learning python apps. Specifically for dlio_benchmark.
MedSegBench: A Comprehensive Benchmark for Medical Image Segmentation in Diverse Data Modalities
Interactive Benchmarking for Machine Learning.
The benchmark package allows you to test AI Brain
This repository is designed to simplify the evaluation process of vision-language models. It provides a comprehensive set of tools and scripts for evaluating VLM models and benchmarks.
a python package to benchmarks algorithms against various datasets
A simple ANN benchmark tools
Experiment management and benchmark tools for mathematical optimization
Use LLMs to get classification risk scores on tabular tasks.
Python package for building, simulating, and benchmarking hybrid quantum-classical algorithms.
A Comprehensive Benchmark of Deep Model Fusion
Superseded by: gbd-tools
LLM semantic testing and benchmarking framework
UNIQUE is a Python package for benchmarking uncertainty estimation and quantification methods for Machine Learning models predictions.
SpuCo: Spurious Correlations Datasets and Benchmarks
Easily generate simple continual learning benchmarks.
Pluristic alignment evaluation benchmark for LLMs
Benchmarks for Bayesian optimization
BrowserGym integration for the WebLINX benchmark
Electrical Power System Benchmark Models.
Benchmark Tool for CaddoBenchmark Project
a Python package for automatic training and benchmarking of Language Models.
CLI extension for AEA framework benchmarking.
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
Benchmarking tool for vector databases
AI Maintainer Agent Harness for our benchmarking and Marketplace API and platform
A benchmarking sandbox for mode choice models
Bayesian optimization benchmark system
Simple probabilistic time series benchmark models