Discover and retrieve water data from U.S. federal hydrologic web services.
A library for providing a simple interface to create new metrics and an easy-to-use toolkit for metric computations and checkpointing.
Model evaluation for Machine Learning pipelines.
PMML evaluator library for Python
A lightweight and configurable evaluation package
A framework for evaluating large multi-modality language models
Send Sir Perceval on a quest to fetch and gather data from software repositories.
eval-mm is a tool for evaluating Multi-Modal Large Language Models.
Open-source evaluators for LLM agents
Windows-compatible fork of OpenAI's human-eval
Easily computing clip embeddings and building a clip retrieval system with them
A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics
DataEval provides a simple interface to characterize image data and its impact on model performance across classification and object-detection tasks
Prompt flow evals
A pytest plugin for running and analyzing LLM evaluation tests
Python code evaluation system and submissions server capable of unit tests, tracing, and AST inspection. Server can run on Python 2.7 but evaluation requires 3.7+.
Python package for evaluating neuron segmentations in terms of the number of splits and merges
Bundle of Perceval backends for Mozilla ecosystem.
Bundle of Perceval backends for OPNFV ecosystem.
Bundle of Perceval backends for Puppet, Inc. ecosystem.
A flexible, generalized tree-based data structure.
Bundle of Perceval backends for Weblate.
OpenCompass VLM Evaluation Kit for Eval-Scope
A short description of the package
HydroEval: An Evaluator for Streamflow Time Series In Python
Python bindings for evalexpr Rust crate for safe expression evaluation
Judgeval Package
User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
Open-Source Evaluation for GenAI Applications.
scikit-learn model evaluation made easy: plots, tables andmarkdown reports.
Evaluation module for Ragbits components
Query metadatdata from sdists / bdists / installed packages. Safer fork of pkginfo to avoid doing arbitrary imports and eval()
clusteval is a python package for unsupervised cluster validation.
Core package for LLM evaluation platform, providing base classes and utilities.
Llama Stack Remote Eval Provider for TrustyAI LM-Eval
The University of Saskatchewan Retrieval Framework
Drug response evaluation of cancer cell line drug response models in a fair setting
Evaluation and adaption method for the UNICORN Challenge
SimulEval: A Flexible Toolkit for Automated Machine Translation Evaluation
A Framework for Automatic Evaluation of Flood Inundation Mapping Predictions Evaluation
Perceval backend for public-inbox.
Interface to ndeval.c
Extension for accessing the LongEval test collections via ir_datasets.
Well tested evaluation framework for Text summarization
Wrapper around ast.literal_eval with new {foo='bar', key=None} dict syntax.
Python evaluator for jQuery-QueryBuilder rules
A package to evaluate how close a synthetic data set is to real data.