User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
PMML evaluator library for Python
Python package for evaluating neuron segmentations in terms of the number of splits and merges
Easily computing clip embeddings and building a clip retrieval system with them
Bundle of Perceval backends for Mozilla ecosystem.
Discover and retrieve water data from U.S. federal hydrologic web services.
Generate eval datasets from arbitrary sources
In-loop evaluation tasks for language modeling
Bundle of Perceval backends for OPNFV ecosystem.
Bundle of Perceval backends for Weblate.
A lightweight and configurable evaluation package
Python code evaluation system and submissions server capable of unit tests, tracing, and AST inspection. Server can run on Python 2.7 but evaluation requires 3.7+.
Perceval backend for Topicbox
Perceval backend for public-inbox.
Bundle of Perceval backends for Puppet, Inc. ecosystem.
A flexible, generalized tree-based data structure.
Open-Source Evaluation for GenAI Applications.
A pytest plugin for running and analyzing LLM evaluation tests
Windows-compatible fork of OpenAI's human-eval
A module for evaluating language models during training
Perceval backend for Pontoon
Well tested evaluation framework for Text summarization
A framework for evaluating large multi-modality language models
scikit-learn model evaluation made easy: plots, tables andmarkdown reports.
DataEval provides a simple interface to characterize image data and its impact on model performance across classification and object-detection tasks
clusteval is a python package for unsupervised cluster validation.
A python module to aid auto evaluation
Your go-to, no-fuss eval for zapping through RAG and LLM evaluations! For a busy person who's too swamped to wrestle with the behemoths of professional setups
SimulEval: A Flexible Toolkit for Automated Machine Translation Evaluation
Model evaluation for Machine Learning pipelines.
Bucketed Scene Flow Evaluation
Evaluation framework for DataBench
Wrapper around ast.literal_eval with new {foo='bar', key=None} dict syntax.
A package to evaluate how close a synthetic data set is to real data.
The University of Saskatchewan Retrieval Framework
Query metadatdata from sdists / bdists / installed packages. Safer fork of pkginfo to avoid doing arbitrary imports and eval()
OpenCompass VLM Evaluation Kit for Eval-Scope
Evaluation module for Ragbits components
Python package classeval
Core ARL data model library
A framework for evaluation and development of temporal-aware models.
Document search and indexing based on summaries and embeddings, using pgvector
A simple, safe single expression evaluator library.
HydroEval: An Evaluator for Streamflow Time Series In Python
PICAI Evaluation