🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more

mcp-evals

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

mcp-evals

GitHub Action for evaluating MCP server tool calls using LLM-based scoring

1.0.18
latest
Version published
Weekly downloads
47K
-14.06%
Maintainers
1
Weekly downloads
 
Created

MCP Evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

Installation

As a Node.js Package

npm install mcp-evals

As a GitHub Action

Add the following to your workflow file:

name: Run MCP Evaluations
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm install
        
      - name: Run MCP Evaluations
        uses: mclenhard/mcp-evals@v1.0.9
        with:
          evals_path: 'src/evals/evals.ts'
          server_path: 'src/index.ts'
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          model: 'gpt-4'  # Optional, defaults to gpt-4

Usage

1. Create Your Evaluation File

Create a file (e.g., evals.ts) that exports your evaluation configuration:

import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";

const weatherEval: EvalFunction = {
    name: 'Weather Tool Evaluation',
    description: 'Evaluates the accuracy and completeness of weather information retrieval',
    run: async () => {
      const result = await grade(openai("gpt-4"), "What is the weather in New York?");
      return JSON.parse(result);
    }
};
const config: EvalConfig = {
    model: openai("gpt-4"),
    evals: [weatherEval]
  };
  
  export default config;
  
  export const evals = [
    weatherEval,
    // add other evals here
]; 

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically:

  • Run your evaluations
  • Post the results as a comment on the PR
  • Update the comment if the PR is updated

Evaluation Results

Each evaluation returns an object with the following structure:

interface EvalResult {
  accuracy: number;        // Score from 1-5
  completeness: number;    // Score from 1-5
  relevance: number;       // Score from 1-5
  clarity: number;         // Score from 1-5
  reasoning: number;       // Score from 1-5
  overall_comments: string; // Summary of strengths and weaknesses
}

Configuration

Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key (required)

Evaluation Configuration

The EvalConfig interface requires:

  • model: The language model to use for evaluation (e.g., GPT-4)
  • evals: Array of evaluation functions to run

Each evaluation function must implement:

  • name: Name of the evaluation
  • description: Description of what the evaluation tests
  • run: Async function that takes a model and returns an EvalResult

License

MIT

FAQs

Package last updated on 30 Apr 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts