Introducing Socket Firewall: Free, Proactive Protection for Your Software Supply Chain.Learn More
Socket
Book a DemoInstallSign in
Socket

mcp-evals

Package Overview
Dependencies
Maintainers
1
Versions
21
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

mcp-evals

GitHub Action for evaluating MCP server tool calls using LLM-based scoring

Source
npmnpm
Version
1.0.13
Version published
Weekly downloads
16K
7.51%
Maintainers
1
Weekly downloads
 
Created
Source

MCP Evals

A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.

Installation

As a Node.js Package

npm install mcp-evals

As a GitHub Action

Add the following to your workflow file:

name: Run MCP Evaluations

on:
  pull_request:
    types: [opened, synchronize, reopened]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm ci
        
      - name: Run MCP Evaluations
        uses: matthewlenhard/mcp-evals@v1
        with:
          evals_path: 'path/to/your/evals.ts'
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          model: 'gpt-4'  # Optional, defaults to gpt-4

Usage

1. Create Your Evaluation File

Create a file (e.g., evals.ts) that exports your evaluation configuration:

import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";

const weatherEval: EvalFunction = {
    name: 'Weather Tool Evaluation',
    description: 'Evaluates the accuracy and completeness of weather information retrieval',
    run: async () => {
      const result = await grade(openai("gpt-4"), "What is the weather in New York?");
      return JSON.parse(result);
    }
};
const config: EvalConfig = {
    model: openai("gpt-4"),
    evals: [weatherEval]
  };
  
  export default config;
  
  export const evals = [
    weatherEval,
    // add other evals here
]; 

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically:

  • Run your evaluations
  • Post the results as a comment on the PR
  • Update the comment if the PR is updated

Evaluation Results

Each evaluation returns an object with the following structure:

interface EvalResult {
  accuracy: number;        // Score from 1-5
  completeness: number;    // Score from 1-5
  relevance: number;       // Score from 1-5
  clarity: number;         // Score from 1-5
  reasoning: number;       // Score from 1-5
  overall_comments: string; // Summary of strengths and weaknesses
}

Configuration

Environment Variables

  • OPENAI_API_KEY: Your OpenAI API key (required)

Evaluation Configuration

The EvalConfig interface requires:

  • model: The language model to use for evaluation (e.g., GPT-4)
  • evals: Array of evaluation functions to run

Each evaluation function must implement:

  • name: Name of the evaluation
  • description: Description of what the evaluation tests
  • run: Async function that takes a model and returns an EvalResult

License

MIT

Keywords

mcp

FAQs

Package last updated on 24 Apr 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts