🚀. Socket Launch Week Day 2:Introducing Manifest Alerts.Learn more
Sign In

@fre4x/benchmark

Package Overview
Dependencies
Maintainers
1
Versions
7
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@fre4x/benchmark - npm Package Compare versions

Comparing version
1.1.0-beta.4
to
1.1.0-beta.6
+1
-1
package.json
{
"name": "@fre4x/benchmark",
"version": "1.1.0-beta.4",
"version": "1.1.0-beta.6",
"description": "A deterministic benchmark MCP server for agent evaluation workflows.",

@@ -5,0 +5,0 @@ "type": "module",

+13
-13

@@ -11,18 +11,18 @@ # benchmark — Deterministic Agent Evaluation

|------|---------|
| `benchmark_list_challenges` | List deterministic benchmark suites with family, runner, and checker metadata |
| `benchmark_get_catalog_status` | Inspect catalog source configuration, cache state, and availability |
| `benchmark_sync_catalog` | Fetch and cache the remote benchmark catalog when a URL source is configured |
| `benchmark_start_challenge` | Start an attempt and return the first task |
| `benchmark_submit_solution` | Grade one task and return checker evidence plus the next task or final score |
| `benchmark_get_asset` | Read an attached benchmark asset by `asset_id` |
| `benchmark_get_attempt` | Inspect attempt status, current task, and paginated evaluation history |
| `benchmark_cancel_attempt` | Cancel an active attempt |
| `list_challenges` | List deterministic benchmark suites with family, runner, and checker metadata |
| `get_catalog_status` | Inspect catalog source configuration, cache state, and availability |
| `sync_catalog` | Fetch and cache the remote benchmark catalog when a URL source is configured |
| `start_challenge` | Start an attempt and return the first task |
| `submit_solution` | Grade one task and return checker evidence plus the next task or final score |
| `get_asset` | Read an attached benchmark asset by `asset_id` |
| `get_attempt` | Inspect attempt status, current task, and paginated evaluation history |
| `cancel_attempt` | Cancel an active attempt |
## Workflow
1. Call `benchmark_list_challenges`
1. Call `list_challenges`
2. Pick a `challenge_id`
3. Call `benchmark_start_challenge`
4. If the task has assets, call `benchmark_get_asset`
5. Call `benchmark_submit_solution`
3. Call `start_challenge`
4. If the task has assets, call `get_asset`
5. Call `submit_solution`
6. Repeat until `done: true`

@@ -81,3 +81,3 @@

When `BENCHMARK_CATALOG_URL` is set, the package will reuse a fresh cached copy when available and can be explicitly refreshed with `benchmark_sync_catalog`.
When `BENCHMARK_CATALOG_URL` is set, the package will reuse a fresh cached copy when available and can be explicitly refreshed with `sync_catalog`.

@@ -84,0 +84,0 @@ ## Catalog shape

Sorry, the diff of this file is too big to display