Benchmark QCD physics
Spatio Temporal Causal Benchmarking Platform
RelBench: Relational Deep Learning Benchmark
I/O profiler for deep learning python apps. Specifically for dlio_benchmark.
Benchmark toolkit for optimization
Generating Event Data with Intentional Features for Benchmarking Process Mining
Official Implementation of "COLLIE: Systematic Construction of Constrained Text Generation Tasks"
Test ZenGuard AI against different datasets and benchmarks.
A fork of the InterCode benchmark used to evaluate natural language to Bash command translation.
Preprocessing scripts for the DRAGON benchmark
Richbench, a little benchmarking tool
This python tools helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.
🦜💪 Flex those feathers!
Python package of ADBench
MassSpecGym: A benchmark for the discovery and identification of molecules
An LLM inferencing benchmark tool focusing on device-specific latency and memory usage
LLM Inference Benchmarking Tool
Generate heatmap-like visualisations for benchmark data frames.
OGBench: Benchmarking Offline Goal-Conditioned RL
An awesome tool/library benchmark LLM performance on all kinds of hardware!
EBES: Easy Benchmarking for Event Sequences.
The seismological machine learning benchmark collection
A simple benchmarking tool for RL algorithms on Atari games
Create, Run and Benchmark DVC Pipelines in Python
Dataset and code for 'BLADE: Benchmarking Language Model Agents for Data-Driven Science'(https://arxiv.org/abs/2408.09667)
Multi Comparison Matrix: A long term approach to benchmark evaluations
Benchmarking framework for all types of black-box optimization algorithms, postprocessing.
A library for benchmarking poses of 3D SBDD models
HPC Plotter and profiler for benchmarking data made for JAX
Benchmark sagemaker serverless endpoints for cost and performance
A Django management command to measure and benchmark database query performance.
Python package for calculating simple benchmarks from hydroclimatic timeseries
A small yet powerful LM Judge
WaterBenchmarkHub
PyAnaDroid: A replicable, fully-customizable execution pipeline foranalyzing and benchmarking Android Applications
Benchmarking suite for evaluating autonomous agents in real-world domains.
A benchmarking library for quantum and classical machine learning, with specialized support for evaluating kernel methods.
A Python implementation of the Benchmark Simulation Model 2 (BSM2) plant layout according to the IWA standard.
Benchmarking framework for noisy optimization and experiment planning
A benchmark for deep learning-based low dose CT image denoising
New machine learning benchmarks from tabular datasets.
Continuous Benchmarking (CB) Framework