MEALPY: An Open-source Library for Latest Meta-heuristic Algorithms in Python
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis.
Evaluating single-cell data integration methods
A high-performant C++ implementation of benchmark functions for mathematical optimization algorithms.
Opfunu: An Open-source Python Library for Optimization Benchmark Functions
Library and Client for managing, benchmarking, and interacting with jupyterhub
Benchmarking imputation methods for microdata
Macrobenchmarking framework for OpenSearch
resp-benchmark is a benchmark tool for testing databases that support the RESP protocol, such as Redis, Valkey, and Tair.
OpenMMLab Image and Video Editing Toolbox and Benchmark
Beautiful and pythonic benchmarks engine.
Multi-vendor GPU health monitoring supporting old GPUs for e-waste reduction
Diverse Genomic Embedding Benchmark
ManiSkill3: A Unified Benchmark for Generalizable Manipulation Skills
OpenMMLab Model Compression Toolbox and Benchmark
A package for benchmarking the performance of arbitrary functions
A Comprehensive Benchmark for Large Language Model Efficiency
CLI extension for AEA framework benchmarking.
Benchmark chemistry performance of LLMs
A benchmark designed to advance foundation models for Earth monitoring, tailored for remote sensing. It encompasses six classification and six segmentation tasks, curated for precision and model evaluation. The package also features a comprehensive evaluation methodology and showcases results from 20 established baseline models.
A library to benchmark code snippets.
Benchmarking framework for all types of black-box optimization algorithms.
WILDS distribution shift benchmark
ManiSkill3: A Unified Benchmark for Generalizable Manipulation Skills
Bench-AF: Alignment Faking Benchmark
A small yet powerful LM Judge
benchmark of ssrJSON
Library to systematically track and evaluate LLM based applications.
Continuous Benchmarking (CB) Framework
Library for working with compliance benchmarks and data.
RelBench: Relational Deep Learning Benchmark
Tests4Py a benchmark for testing bugs
An automated tool that assesses the GitLab CIS benchmarks against a project.
natural intelligence benchmarks for scikit-learn
Fuzzy Data Benchmark
A Unified Change Representation Learning Benchmark Library
A tool for Behavior benchmARKing
GBD Tools: Maintenance and Distribution of Benchmark Instances and their Attributes
OpenMMLab Video Understanding Toolbox and Benchmark
A comprehensive toolkit for large model evaluation
A tool for automated scientific benchmarking
Benchmark toolkit for DrVD
Fix Inventory Compliance Benchmarks and Checks
Create, Run and Benchmark DVC Pipelines in Python
Tiny Python benchmarking library
Benchmark for language models
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
The seismological machine learning benchmark collection
Python benchmark suite