Minimal Python library to connect to LLMs (OpenAI, Anthropic, Google, Mistral, OpenRouter, Reka, Groq, Together, Ollama, AI21, Cohere, Aleph-Alpha, HuggingfaceHub), with a built-in model performance benchmark.

llm

llms

large language model

AI

NLP

natural language processing

spectrumlab

A pioneering unified platform designed to systematize and accelerate deep learning research in spectroscopy.

pygmtools

pygmtools provides graph matching solvers in Python API and supports numpy and pytorch backends. pygmtools also provides dataset API for standard graph matching benchmarks.

robotframework-timer

Benchmark tools for robotframework

counted-float

Count floating-point operations in Python code & benchmark relative flop costs.

oscar-benchmarking

A package for submitting benchmarking scripts on OSCAR.

feelpp-benchmarking

The Feel++ Benchmarking Project

reprobench

Reproducible Benchmark for Everyone

fmbench

Benchmark performance of **any Foundation Model (FM)** deployed on **any AWS Generative AI service**, be it **Amazon SageMaker**, **Amazon Bedrock**, **Amazon EKS**, or **Amazon EC2**. The FMs could be deployed on these platforms either directly through `FMbench`, or, if they are already deployed then also they could be benchmarked through the **Bring your own endpoint** mode supported by `FMBench`.

benchmarking

sagemaker

bedrock

bring your own endpoint

generative-ai

foundation-models

zindex-py

Indexer for GZIP specially built for DLIO Profiler.

llm-benchmark

LLM Benchmark

conbench

Continuous Benchmarking (CB) Framework

qtip

Platform Performance Benchmarking

malayalam-asr-benchmarking

A study to benchmark whisper based ASRs in Malayalam

nbdev jupyter notebook python

sm-serverless-benchmarking

Benchmark sagemaker serverless endpoints for cost and performance

sagemaker

inference

hosting

ogbench

OGBench: Benchmarking Offline Goal-Conditioned RL

bark-simulator

A tool for Behavior benchmARKing

simulator autonomous driving machine learning

agbenchmark

Benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

smallbench

Small Benchmarks for LM Agents

perun

Measure the energy used by your MPI+Python applications.

pyaf

Python Automatic Forecasting

arx automatic-forecasting autoregressive benchmark cycle decomposition exogenous forecasting heroku hierarchical-forecasting horizon jupyter pandas python scikit-learn seasonal time-series transformation trend web-service

mmrotate

Rotation Detection Toolbox and Benchmark

computer vision

object detection

rotation detection

benchloop

A Python library for managing, processing, and benchmarking datasets in SQLite databases for AI pipelines and LLM prompt engineering.

genai-bench

A powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

assay-inspector

AssayInspector: A Python package for diagnostic assessment of data consistency in molecular datasets.

accurate-benchmark

A python package for accurate benchmarking and speed comparisons

nodespecs

The specs summarize utilities for computer instance

cpu

gpu

benchmark

appworld

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

vectordb-bench

VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.