
Security News
OWASP 2025 Top 10 Adds Software Supply Chain Failures, Ranked Top Community Concern
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.
@caleblawson/evals
Advanced tools
A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.
A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.
npm install @mastra/evals
@mastra/evals provides a suite of evaluation metrics for assessing AI model outputs. The package includes both LLM-based and NLP-based metrics, enabling both automated and model-assisted evaluation of AI responses.
Answer Relevancy
Bias Detection
Context Precision & Relevancy
Faithfulness
Prompt Alignment
Toxicity
Completeness
Content Similarity
Keyword Coverage
import { ContentSimilarityMetric, ToxicityMetric } from '@mastra/evals';
// Initialize metrics
const similarityMetric = new ContentSimilarityMetric({
ignoreCase: true,
ignoreWhitespace: true,
});
const toxicityMetric = new ToxicityMetric({
model: openai('gpt-4'),
scale: 1, // Optional: adjust scoring scale
});
// Evaluate outputs
const input = 'What is the capital of France?';
const output = 'Paris is the capital of France.';
const similarityResult = await similarityMetric.measure(input, output);
const toxicityResult = await toxicityMetric.measure(input, output);
console.log('Similarity Score:', similarityResult.score);
console.log('Toxicity Score:', toxicityResult.score);
import { FaithfulnessMetric } from '@mastra/evals';
// Initialize with context
const faithfulnessMetric = new FaithfulnessMetric({
model: openai('gpt-4'),
context: ['Paris is the capital of France', 'Paris has a population of 2.2 million'],
scale: 1,
});
// Evaluate response against context
const result = await faithfulnessMetric.measure(
'Tell me about Paris',
'Paris is the capital of France with 2.2 million residents',
);
console.log('Faithfulness Score:', result.score);
console.log('Reasoning:', result.reason);
Each metric returns a standardized result object containing:
score: Normalized score (typically 0-1)info: Detailed information about the evaluationSome metrics also provide:
reason: Detailed explanation of the scoreverdicts: Individual judgments that contributed to the final scoreThe package includes built-in telemetry and logging capabilities:
import { attachListeners } from '@mastra/evals';
// Enable basic evaluation tracking
await attachListeners();
// Store evals in Mastra Storage (if storage is enabled)
await attachListeners(mastra);
// Note: When using in-memory storage, evaluations are isolated to the test process.
// When using file storage, evaluations are persisted and can be queried later.
Required for LLM-based metrics:
OPENAI_API_KEY: For OpenAI model access// Main package exports
import { evaluate } from '@mastra/evals';
// NLP-specific metrics
import { ContentSimilarityMetric } from '@mastra/evals/nlp';
@mastra/core: Core framework functionality@mastra/engine: LLM execution engine@mastra/mcp: Model Context Protocol integrationFAQs
A comprehensive evaluation framework for assessing AI model outputs across multiple dimensions.
The npm package @caleblawson/evals receives a total of 0 weekly downloads. As such, @caleblawson/evals popularity was classified as not popular.
We found that @caleblawson/evals demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.

Research
/Security News
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.