Macrobenchmarking framework for OpenSearch
Modern benchmarking library for python with pytest integration.
A tool for Behavior benchmARKing
resp-benchmark is a benchmark tool for testing databases that support the RESP protocol, such as Redis, Valkey, and Tair.
Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.
A public and reproducible collection of reference implementations and benchmark suite for distributed machine learning systems.
An automated tool that assesses the GitLab CIS benchmarks against a project.
Official Implementation of "COLLIE: Systematic Construction of Constrained Text Generation Tasks"
Library and Client for managing, benchmarking, and interacting with jupyterhub
New machine learning benchmarks from tabular datasets.
ManiSkill3: A Unified Benchmark for Generalizable Manipulation Skills
Set of robot URDFs for benchmarking and developed examples.
A package for benchmarking the performance of arbitrary functions
Library for working with compliance benchmarks and data.
evalsync is a library used to synchronize applications under benchmark with an external manager
WILDS distribution shift benchmark
A package for benchmarking the speed of different PyTorch conversion options
Benchmark performance of **any Foundation Model (FM)** deployed on **any AWS Generative AI service**, be it **Amazon SageMaker**, **Amazon Bedrock**, **Amazon EKS**, or **Amazon EC2**. The FMs could be deployed on these platforms either directly through `FMbench`, or, if they are already deployed then also they could be benchmarked through the **Bring your own endpoint** mode supported by `FMBench`.
PyAnaDroid: A replicable, fully-customizable execution pipeline foranalyzing and benchmarking Android Applications
A framework for evaluating web automation agents and LAM systems.
Tiny Python benchmarking library
Adapters for Running and Tracking Benchmarks
Loop Kernel Analysis and Performance Modeling Toolkit
BrowserGym integration for the WebLINX benchmark
Continuous Benchmarking (CB) Framework
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
OpenMMLab Video Understanding Toolbox and Benchmark
Benchmark sagemaker serverless endpoints for cost and performance
Quick and easy python benchmarking.
A package for implementation of Quantum Characterization, Verification and Validation (QCVV) techniques on IQM's hardware at gate level abstraction
The Feel++ Benchmarking Project
A lightweight toolkit for evaluating LLMs based on OpenCompass.
Benchmarking the performance of agents far and wide, regardless of how they are set up and how they work
Python package of ADBench
The Redis benchmarks specification describes the cross-language/tools requirements and expectations to foster performance and observability standards around redis related technologies. Members from both industry and academia, including organizations and individuals are encouraged to contribute.
CatBench: Benchmark of Machine Learning Potentials for Adsorption Energy Predictions in Heterogeneous Catalysis
A package to download, load, and process multiple benchmark multi-omic drug response datasets
Superseded by: gbd-tools
Library to systematically track and evaluate LLM based applications.
Repostory of Protein Benchmarking and Modeling
Advanced benchmarking for machine learning models.
peek - debugging and benchmarking made easy
LLM Benchmark
Benchmark QCD physics
Benchmark functions that returns total space, mem, cpu given input size and parameters for the CWL workflows
Causal AI Benchmarking Framework
Benchmarking framework for machine learning with fNIRS