Socket
Socket
Sign inDemoInstall

athina-evals

Package Overview
Dependencies
8
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    athina-evals

Python SDK to configure and run evaluations for your LLM-based application


Maintainers
1

Readme

Overview

athina-evals is an framework to help you quickly set up evaluations and monitoring for your LLM-powered application

It's difficult to know if your LLM response is good or bad. Most developers start out by simply eyeballing the responses. This is fine when you're building a prototype and testing on 5-10 examples.

But once you optimize for reliability in production, this method breaks down.

Evals can help you:

  • Detect regressions
  • Measure performance of model (as defined by your goals)
  • A/B test different models and prompts rapidly
  • Monitor production data with confidence
  • Run quantifiable experiments against ambiguous conversations

Think of evals like unit tests for your LLM app.

Documentation

See https://docs.athina.ai for the complete documentation.

Quick Start

1. Install the package
pip install athina-evals
2. Get an Athina API key

Sign up at athina.ai to get an API key.

(free, and only takes about 30 seconds)

3. Set API keys
from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))
4. Run evals
# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)

# Run the DoesResponseAnswerQuery evaluator on the dataset
DoesResponseAnswerQuery().run_batch(data=dataset)



Why should I use Athina's Evals instead of writing my own?

You could build your own eval system from scratch, but here's why Athina might be better for you:

  • Athina provides you with plug-and-play preset evals that have been well-tested
  • Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
  • Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
  • Athina offers a modular, extensible framework for writing and running evals
  • Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets

Athina Evals Platform

Need Production Monitoring and Evals? We've got you covered...

  • Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
  • Athina track your experiments automatically, so you can view a historical record of previous eval runs.
  • Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.

Athina Observe Platform

About Athina

Athina is building an end-to-end LLM monitoring and evaluation platform.

Website | Demo Video

Contact us at hello@athina.ai for any questions about the eval library.

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc