Simple benchmark framework (in active development)
A ``pytest`` fixture for benchmarking code. It will group the tests into rounds that are calibrated to the chosen timer.
Reversible Data Transforms
Pytest plugin to create CodSpeed benchmarks
Massive Text Embedding Benchmark
OpenMMLab Detection Toolbox and Benchmark
provides a common interface to many IR ad-hoc ranking benchmarks, training datasets, etc.
Metrics for multiple object tracker benchmarking.
Store data created during your pytest tests execution, and retrieve it at the end of the session, e.g. for applicative benchmarking purposes.
Airspeed Velocity: A simple Python history benchmarking tool
Python module to run and analyze benchmarks
A Python Toolbox for Benchmarking Machine Learning on Partially-Observed Time Series
Benchmarking QRC measures the ability to store information of
Modern benchmarking library for python with pytest integration.
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Open MMLab Semantic Segmentation Toolbox and Benchmark
VisualWebArena benchmark for BrowserGym
WebArena benchmark for BrowserGym
MiniWoB++ benchmark for BrowserGym
A Python wrapper for the Penn Machine Learning Benchmark data repository.
AssistantBench benchmark for BrowserGym
This is an unofficial, use-at-your-own risks port of the visualwebarena benchmark, for use as a standalone library package.
This is an unofficial, use-at-your-own risks port of the webarena benchmark, for use as a standalone library package.
Benchmark Runner Tool
WorkArena benchmark for BrowserGym
A public and reproducible collection of reference implementations and benchmark suite for distributed machine learning systems.
Merlion: A Machine Learning Framework for Time Series Intelligence
BrowserGym integration for the WebLINX benchmark
Tools to benchmark, deploy and monitor prediction market agents.
Macrobenchmarking framework for OpenSearch
OpenMMLab Image Classification Toolbox and Benchmark
Benchmark your code
OpenMMLab Pose Estimation Toolbox and Benchmark.
OpenMMLab Model Pretraining Toolbox and Benchmark
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis.
Official Implementation of "COLLIE: Systematic Construction of Constrained Text Generation Tasks"
CLIP-like models benchmarks on various datasets
Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.
A Heterogeneous Benchmark for Information Retrieval
A high-performant C++ implementation of benchmark functions for mathematical optimization algorithms.
QCVV and Benchmarking
Scikit-learn-compatible datasets
Library and Client for managing, benchmarking, and interacting with jupyterhub
GBD Tools: Maintenance and Distribution of Benchmark Instances and their Attributes
Fuzzy Data Benchmark
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
A library to benchmark code snippets.