EvalLite 🚀
An efficient, zero-cost LLM evaluation framework combining the simplicity of DeepEval with the power of free Hugging Face models through AILite.
🌟 Key Features
- Zero-Cost Evaluation: Leverage free Hugging Face models for LLM evaluation
- Simple Integration: Drop-in replacement for DeepEval's evaluation capabilities
- Extensive Model Support: Access to leading open-source models including:
- Meta Llama 3.1 70B Instruct
- Qwen 2.5 72B Instruct
- Mistral Nemo Instruct
- Phi-3.5 Mini Instruct
- And more!
- Comprehensive Metrics: Full compatibility with DeepEval's evaluation metrics
- Async Support: Built-in asynchronous evaluation capabilities
📥 Installation
pip install evallite
🚀 Quick Start
Here's a simple example to get you started with EvalLite:
from evallite import (
assert_test,
EvalLiteModel,
LLMTestCase,
evaluate,
AnswerRelevancyMetric
)
answer_relevancy_metric = AnswerRelevancyMetric(
threshold=0.7,
model=EvalLiteModel(model="microsoft/Phi-3.5-mini-instruct")
)
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output="We offer a 30-day full refund at no extra costs.",
retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)
evaluate([test_case], [answer_relevancy_metric])
🔧 Available Models
EvalLite supports several powerful open-source models:
from evallite import EvalLiteModel
models = [
'meta-llama/Meta-Llama-3.1-70B-Instruct',
'CohereForAI/c4ai-command-r-plus-08-2024',
'Qwen/Qwen2.5-72B-Instruct',
'nvidia/Llama-3.1-Nemotron-70B-Instruct-HF',
'meta-llama/Llama-3.2-11B-Vision-Instruct',
'NousResearch/Hermes-3-Llama-3.1-8B',
'mistralai/Mistral-Nemo-Instruct-2407',
'microsoft/Phi-3.5-mini-instruct'
]
evaluator = EvalLiteModel(model='microsoft/Phi-3.5-mini-instruct')
📊 Advanced Usage
Custom Schema Support
EvalLite supports custom response schemas using Pydantic models:
from pydantic import BaseModel
from typing import List
class Statements(BaseModel):
statements: List[str]
result = evaluator.generate(
prompt="List three facts about climate change",
schema=Statements
)
Async Evaluation
async def evaluate_async():
response = await evaluator.a_generate(
prompt="What is the capital of France?",
schema=Statements
)
return response
Batch Evaluation
from evallite import EvaluationDataset
test_cases = [
LLMTestCase(
input="Question 1",
actual_output="Answer 1",
retrieval_context=["Context 1"]
),
LLMTestCase(
input="Question 2",
actual_output="Answer 2",
retrieval_context=["Context 2"]
)
]
dataset = EvaluationDataset(test_cases=test_cases)
evaluate(dataset, [answer_relevancy_metric])
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
📄 License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
🙏 Acknowledgments
- DeepEval for the evaluation framework
- AILite for providing free model access
- The open-source community for making powerful language models accessible