Bundle of Perceval backends for Puppet, Inc. ecosystem.
A platform to evaluate LLM outputs using various evaluators.
Eval-it Evaluates models for Training
Evaluating global level Ellipsis to useful code.
LangEvals boilerplate example evaluator for LLMs.
Perceval backend for public-inbox.
LangEvals Azure Content Safety evaluator for LLM outputs.
Perceval backend for Pontoon
LangEvals OpenAI moderation evaluator for LLM outputs.
LangEvals integration for AWS APIs evaluators
Query metadatdata from sdists / bdists / installed packages. Safer fork of pkginfo to avoid doing arbitrary imports and eval()
Finetune_Eval_Harness
PICAI Evaluation
A framework for evaluating large multi-modality language models
eval expression
A redis semaphore implementation using eval scripts
ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion
xaif package
Visualize OpenAI evals with Zeno
Simple tool to provide automation to assessment processes.
Python SDK for developing AI agent evals and observability
A package for evaluating the performance of language models with Prometheus
A RAG evaluation framework
rank_eval: A Blazing Fast Python Library for Ranking Evaluation and Comparison
A user-friendly feature evaluation and selection package.
(Threshold-Independent) Evaluation of Sound Event Detection Scores
ADC Evaluation Library
LLM Application Debug/Eval UI on top of AIConfig
Evaluate arbitrary JavaScript from Python using a NodeJS sidecar
Python SDK to configure and run evaluations for your LLM-based application
Interpretable Evaluation for Natural Language Processing
This package is written for the evaluation of audio generation model.
A tool to quantify the replicability and reproducibility of system-oriented IR experiments.
A module to calculate Polyphonic Sound Detection Score
A package with utility functions for evaluating conformal predictors
Provide Evaluation Metrics for Machine Learning Challenges
Dynamic import from files and other sources
Resolve specially formated statements to Python objects.
AI Maintainer Agent Harness for our benchmarking and Marketplace API and platform
Automatic lyrics transcription evaluation toolkit
A plug & play evaluator for self-supervised image classification.
An Extendable Evaluation Pipeline for Named Entity Drill-Down Analysis
Library for evaluating SafeGraph data
Evaluation method for the DRAGON benchmark
Common metrics and evaluation tools for coreference chains (jsonline format)
Quickly evaluate multi-label classifiers in various metrics
The Evaluation SDK for LLM apps