Generate eval datasets from arbitrary sources
Perceval backend for Pontoon
Your go-to, no-fuss eval for zapping through RAG and LLM evaluations! For a busy person who's too swamped to wrestle with the behemoths of professional setups
LangEvals lingua evaluator for language detection.
A powerful Quantum Photonic Framework
Perceval backend for Topicbox
A module for evaluating the predictions of the models trained on MEDS datasets.
A generative AI-powered framework for testing virtual agents.
LangEvals Ragas evaluator
A package for evaluating the performance of language models with Prometheus
Bucketed Scene Flow Evaluation
An evaluation abstraction for Keras models.
The evaluation for Notte
A simple, safe single expression evaluator library.
CLI tool to evaluate ChatGPT factuality on MMLU benchmark.
Toolkit for summarization evaluation
A package for benchmarking time series machine learning tools.
ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion
To the moon!
Eval-it Evaluates models for Training
PICAI Evaluation
Open-source library of metrics
Evolve your AI application with evals!
Evaluacion Python Nivel II
A tiny package containing a COCO evaluator that works in distributed environments.
A tool for the evaluation of molecules smiles
Using only one line of commmands to evaluate multiple models
An Extendable Evaluation Pipeline for Named Entity Drill-Down Analysis
eval async code from sync
Evaluating global level Ellipsis to useful code.
Eval LLM
An information retrieval evaluation script based on the C/W/L framework that is TREC Compatible and provides a replacement for INST_EVAL, RBP_EVAL, TBG_EVAL, UMeasure and TREC_EVAL scripts. All measurements are reported in the same units making all metrics directly comparable.
A package for evaluating coding curators
rank_eval: A Blazing Fast Python Library for Ranking Evaluation and Comparison
This package is written for the evaluation of speech super-resolution algorithms.
Evaluation method for the DRAGON benchmark
A package for model explainability and explainability comparision for tabular data
A Python package for RAG Evaluation
Evaluation and benchmark for Generative AI
xaif package
TrainLoop Evaluations SDK for data collection and evaluation
A package for COIR evaluations
object schema evaluation
Evaluation framework for DataBench
An evaluation package for LLM input output
eval expression
Reco evaluation tool