
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
GitHub Action for evaluating MCP server tool calls using LLM-based scoring
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
npm install mcp-evals
Add the following to your workflow file:
name: Run MCP Evaluations
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
evaluate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run MCP Evaluations
uses: mclenhard/mcp-evals@v1.0.9
with:
evals_path: 'src/evals/evals.ts'
server_path: 'src/index.ts'
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
model: 'gpt-4' # Optional, defaults to gpt-4
Create a file (e.g., evals.ts
) that exports your evaluation configuration:
import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";
const weatherEval: EvalFunction = {
name: 'Weather Tool Evaluation',
description: 'Evaluates the accuracy and completeness of weather information retrieval',
run: async () => {
const result = await grade(openai("gpt-4"), "What is the weather in New York?");
return JSON.parse(result);
}
};
const config: EvalConfig = {
model: openai("gpt-4"),
evals: [weatherEval]
};
export default config;
export const evals = [
weatherEval,
// add other evals here
];
You can run the evaluations using the CLI:
npx mcp-eval path/to/your/evals.ts path/to/your/server.ts
The action will automatically:
Each evaluation returns an object with the following structure:
interface EvalResult {
accuracy: number; // Score from 1-5
completeness: number; // Score from 1-5
relevance: number; // Score from 1-5
clarity: number; // Score from 1-5
reasoning: number; // Score from 1-5
overall_comments: string; // Summary of strengths and weaknesses
}
OPENAI_API_KEY
: Your OpenAI API key (required)The EvalConfig
interface requires:
model
: The language model to use for evaluation (e.g., GPT-4)evals
: Array of evaluation functions to runEach evaluation function must implement:
name
: Name of the evaluationdescription
: Description of what the evaluation testsrun
: Async function that takes a model and returns an EvalResult
MIT
FAQs
GitHub Action for evaluating MCP server tool calls using LLM-based scoring
The npm package mcp-evals receives a total of 49,296 weekly downloads. As such, mcp-evals popularity was classified as popular.
We found that mcp-evals demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.