
Security News
Open VSX Begins Implementing Pre-Publish Security Checks After Repeated Supply Chain Incidents
Following multiple malicious extension incidents, Open VSX outlines new safeguards designed to catch risky uploads earlier.
trpc.group/trpc-go/trpc-agent-go/examples/evaluation
Advanced tools
This example runs the evaluation pipeline with a local file-backed manager. Evaluation sets, metric definitions, and run results all live on disk so you can inspect or version them alongside source code.
The example supports the following environment variables:
| Variable | Description | Default Value |
|---|---|---|
OPENAI_API_KEY | API key for the model service (required) | `` |
OPENAI_BASE_URL | Base URL for the model API endpoint | https://api.openai.com/v1 |
Note: The OPENAI_API_KEY is required for the example to work.
| Flag | Description | Default |
|---|---|---|
-model | Model identifier used by the calculator agent | deepseek-chat |
-streaming | Enable streaming responses from the LLM | false |
-data-dir | Directory containing .evalset.json and .metrics.json files | ./data |
-output-dir | Directory where evaluation results are written | ./output |
-eval-set | Evaluation set ID to execute | math-basic |
-runs | Number of repetitions per evaluation case | 1 |
cd trpc-agent-go/examples/evaluation/local
go run . \
-model "deepseek-chat" \
-data-dir "./data" \
-output-dir "./output" \
-eval-set "math-basic" \
-runs 1
It prints a case-by-case summary and writes detailed JSON artifacts to ./output/math-eval-app.
data/
└── math-eval-app/
├── math-basic.evalset.json # EvalSet file for math-basic.
└── math-basic.metrics.json # Metric file for math-basic EvalSet.
You can add new cases or metrics by editing these JSON files or by creating additional evaluation set IDs under the same directory.
output/
└── math-eval-app/
└── math-eval-app_math-basic_538cdf6e-925d-41cf-943b-2849982b195e.evalset_result.json # EvalResult file for math-basic EvalSet.
✅ Evaluation completed
App: math-eval-app
Eval Set: math-basic
Overall Status: passed
Runs: 1
Case calc_add -> passed
Metric tool_trajectory_avg_score: score 1.00 (threshold 1.00) => passed
Case calc_multiply -> passed
Metric tool_trajectory_avg_score: score 1.00 (threshold 1.00) => passed
FAQs
Unknown package
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Following multiple malicious extension incidents, Open VSX outlines new safeguards designed to catch risky uploads earlier.

Research
/Security News
Threat actors compromised four oorzc Open VSX extensions with more than 22,000 downloads, pushing malicious versions that install a staged loader, evade Russian-locale systems, pull C2 from Solana memos, and steal macOS credentials and wallets.

Security News
Lodash 4.17.23 marks a security reset, with maintainers rebuilding governance and infrastructure to support long-term, sustainable maintenance.