Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@plasius/ai-evals

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@plasius/ai-evals

Golden datasets, scorecards, and cost-quality evaluation contracts for Plasius AI workloads.

latest
Source
npmnpm
Version
0.1.4
Version published
Weekly downloads
149
1390%
Maintainers
1
Weekly downloads
 
Created
Source

@plasius/ai-evals

Golden datasets, scorecards, threshold, and regression comparison contracts for Plasius AI workloads.

Scope

This package is part of the layered @plasius/ai-* package family. It provides evaluator contracts, fixture definitions, and scorecard utilities used for AI quality and safety governance.

  • Feature flag key: ai.evals-scorecards.enabled
  • Package flag constant: AI_EVALS_SCORECARDS_ENABLED
  • Runtime namespace: AI_EVALS

The package supports:

  • Golden fixture datasets for moderation, player-action validation, NPC speech, routing, and RAG.
  • Metric contracts for quality, cost, latency, confidence, cache savings, and safety regressions.
  • Deterministic scorecard evaluation over fixtures and fake adapter output.
  • Cross-tier scorecard comparison for development/standard/premium workflows.

Install

npm install @plasius/ai-evals

Usage

import {
  AiEvalFixtureAdapter,
  AI_EVALS_FEATURE_FLAG_ID,
  AiEvalMetricExpectation,
  defineAiEvalGoldenDataset,
  evaluateAiEvalScorecard,
  isAiEvalsScorecardsEnabled,
} from "@plasius/ai-evals";

const expectations: readonly AiEvalMetricExpectation[] = [
  { metricId: "quality", threshold: { min: 0.8 } },
  { metricId: "latency", threshold: { max: 500 } },
];

const dataset = defineAiEvalGoldenDataset({
  datasetId: "example-1",
  version: "1.0.0",
  name: "Example moderation fixtures",
  taskType: "moderation",
  baselineExpectations: expectations,
  fixtureCases: [
    { fixtureId: "case-1", input: { prompt: "flag-check" } },
    { fixtureId: "case-2", input: { prompt: "safe-response" } },
  ],
});

const adapter: AiEvalFixtureAdapter<{ prompt: string }> = {
  adapterId: "fake-golden-adapter",
  tier: "development",
  async runFixture(fixture) {
    return {
      fixtureId: fixture.fixtureId,
      metrics: [
        { metricId: "quality", value: 0.93 },
        { metricId: "latency", value: 320 },
      ],
    };
  },
};

if (isAiEvalsScorecardsEnabled({ AI_EVALS_SCORECARDS_ENABLED: "true" })) {
  const scorecard = await evaluateAiEvalScorecard({
    runId: "manual-smoke",
    dataset,
    adapter,
    featureEnabled: true,
  });

  console.log(AI_EVALS_FEATURE_FLAG_ID, scorecard.status);
}

Development

npm install
npm run build
npm test
npm run test:coverage
npm run pack:check

Governance

  • Security policy: SECURITY.md
  • Code of conduct: CODE_OF_CONDUCT.md
  • ADRs: docs/adrs
  • CLA and legal docs: legal
  • Rollback guidance: disable ai.evals-scorecards.enabled to avoid automatic production grade evaluation runs, and rerun with known-good baseline scorecards.

License

Apache-2.0

Keywords

ai

FAQs

Package last updated on 01 Jun 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts