New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

visual-ai-assertions

Package Overview
Dependencies
Maintainers
1
Versions
8
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

visual-ai-assertions

AI-powered visual assertions for E2E tests — send screenshots to Claude, GPT, or Gemini and get structured results

latest
Source
npmnpm
Version
0.7.2
Version published
Maintainers
1
Created
Source

visual-ai-assertions

AI-powered visual assertions for E2E tests. Send screenshots to Claude, GPT, or Gemini and get structured, typed results.

Installation

# Install the library (includes OpenAI SDK by default)
npm install visual-ai-assertions

# Optional: install additional provider SDKs
npm install @anthropic-ai/sdk    # for Claude
npm install @google/genai        # for Gemini

# Zod is a peer dependency
npm install zod

System Requirements

This library uses sharp for image processing. Sharp downloads native binaries automatically for most supported platforms.

If installation fails in CI, Docker, or a minimal Linux image:

  • See the sharp installation guide
  • On Alpine Linux, install vips-dev with apk add --no-cache vips-dev
  • On minimal Docker images, use --platform=linux/amd64 or install the required build tools

Quick Start

Playwright + Anthropic

import { test, expect } from "@playwright/test";
import { visualAI } from "visual-ai-assertions";

const ai = visualAI();
// Provider auto-inferred from ANTHROPIC_API_KEY env var

test("login page looks correct", async ({ page }) => {
  await page.goto("https://myapp.com/login");
  const screenshot = await page.screenshot();

  const result = await ai.check(screenshot, [
    "A login form is visible with email and password fields",
    "A 'Sign In' button is present and visually enabled",
    "The company logo appears in the header",
    "No error messages are displayed",
  ]);

  // Simple pass/fail
  expect(result.pass).toBe(true);

  // Or inspect individual statements
  for (const stmt of result.statements) {
    expect(stmt.pass, `Failed: ${stmt.statement}${stmt.reasoning}`).toBe(true);
  }
});

WebDriverIO + OpenAI

import { visualAI } from "visual-ai-assertions";

const ai = visualAI({ model: "gpt-5-mini" });
// Provider inferred from model prefix

describe("Product Page", () => {
  it("should display all required elements", async () => {
    await browser.url("https://myapp.com/products/1");
    const screenshot = await browser.saveScreenshot("./screenshot.png");

    const result = await ai.elementsVisible(screenshot, [
      "Product title",
      "Price tag",
      "Add to Cart button",
      "Product image",
    ]);

    expect(result.pass).toBe(true);
  });
});

API Reference

visualAI(config?)

Create an AI visual analysis instance. Provider is auto-inferred from the model name or API key environment variable.

import { visualAI, Provider, Model } from "visual-ai-assertions";

// Minimal — provider inferred from ANTHROPIC_API_KEY env var
const ai = visualAI();

// Explicit configuration
const ai = visualAI({
  model: "claude-sonnet-4-6", // optional, sensible defaults per provider
  apiKey: "sk-...", // optional, defaults to provider env var
  debug: true, // optional, logs prompts/responses to stderr
  maxTokens: 4096, // optional, default 4096
  reasoningEffort: "high", // optional, "low" | "medium" | "high" | "xhigh"
  trackUsage: false, // optional, defaults to false — usage stats to stderr
});

// Use constants for IDE autocomplete
const ai = visualAI({
  model: Model.Anthropic.SONNET_4_6,
});

ai.check(image, statements, options?)

Visual assertion. Returns pass: true only if ALL statements are true.

// Single statement
const result = await ai.check(screenshot, "The login button is visible");

// Multiple statements
const result = await ai.check(screenshot, [
  "The login button is visible",
  "No error messages are displayed",
]);

// With instructions
const result = await ai.check(screenshot, ["The form is submitted"], {
  instructions: ["Ignore loading spinners that appear briefly"],
});

Returns: CheckResult

{
  pass: boolean;             // true only if ALL statements pass
  reasoning: string;         // overall summary
  issues: Issue[];           // structured findings
  statements: StatementResult[]; // per-statement breakdown
  usage?: {
    inputTokens: number;
    outputTokens: number;
    estimatedCost?: number;    // USD
    durationSeconds?: number;  // API call duration
  };
}

ai.ask(image, prompt, options?)

Free-form analysis. Returns structured issues with priority and category.

const result = await ai.ask(screenshot, "Analyze this page for UI issues");

// Filter by priority
const critical = result.issues.filter((i) => i.priority === "critical");

// With instructions
const result = await ai.ask(screenshot, "Check for accessibility issues", {
  instructions: ["Ignore contrast on decorative elements"],
});

Returns: AskResult

{
  summary: string;           // high-level analysis
  issues: Issue[];           // categorized findings
  usage?: {
    inputTokens: number;
    outputTokens: number;
    estimatedCost?: number;
    durationSeconds?: number;
  };
}

ai.compare(imageA, imageB, options?)

Compare two images and get structured differences.

import { writeFileSync } from "node:fs";

// Basic comparison
const result = await ai.compare(before, after);

// gemini-3-flash-preview includes an annotated diff by default.
// Pass { diffImage: false } to opt out.

// With custom prompt and instructions
const result = await ai.compare(before, after, {
  prompt: "Focus on header layout changes",
  instructions: ["Ignore date/time differences"],
});

// With AI-generated diff image (supported only by gemini-3-flash-preview)
const result = await ai.compare(before, after, {
  diffImage: true,
});
if (result.diffImage) {
  writeFileSync("diff.png", result.diffImage.data);
}

Returns: CompareResult

{
  pass: boolean;               // true if no critical/major changes
  reasoning: string;           // overall summary
  changes: ChangeEntry[];      // list of visual differences
  diffImage?: {                // present when diffing is enabled explicitly or by Gemini 3 preview defaults
    data: Buffer;              // PNG image data
    width: number;
    height: number;
    mimeType: "image/png";
  };
  usage?: UsageInfo;
}

Where ChangeEntry is:

{
  description: string; // what changed
  severity: "critical" | "major" | "minor";
}

Template Methods

Type-safe methods for common visual QA checks. All return CheckResult. Use Accessibility, Layout, and Content constants for IDE autocomplete.

import { Accessibility, Layout, Content } from "visual-ai-assertions";

// Check that UI elements are visible
await ai.elementsVisible(screenshot, ["Submit button", "Nav bar", "Footer"]);

// Check that UI elements are hidden
await ai.elementsHidden(screenshot, ["Loading spinner", "Error modal"]);

// Accessibility checks (contrast, readability, interactive visibility)
await ai.accessibility(screenshot);
await ai.accessibility(screenshot, {
  checks: [Accessibility.CONTRAST, Accessibility.READABILITY],
});

// Layout checks (overlap, overflow, alignment)
await ai.layout(screenshot);
await ai.layout(screenshot, {
  checks: [Layout.OVERLAP, Layout.OVERFLOW],
  instructions: ["Sticky headers may overlap content — ignore if < 10px"],
});

// Page load verification
await ai.pageLoad(screenshot);
await ai.pageLoad(screenshot, { expectLoaded: false }); // expect loading state

// Content checks (placeholder text, errors, broken images)
await ai.content(screenshot);
await ai.content(screenshot, {
  checks: [Content.PLACEHOLDER_TEXT, Content.ERROR_MESSAGES],
});

Issue Structure

Every issue includes:

{
  priority: "critical" | "major" | "minor";
  category: "accessibility" |
    "missing-element" |
    "layout" |
    "content" |
    "styling" |
    "functionality" |
    "performance" |
    "other";
  description: string; // what the issue is
  suggestion: string; // how to fix it
}

Image Input

Accepts multiple formats:

// Buffer (from Playwright screenshot)
const screenshot = await page.screenshot();
await ai.check(screenshot, "...");

// File path
await ai.check("./screenshots/page.png", "...");

// Base64 string
await ai.check(base64String, "...");

// URL
await ai.check("https://example.com/screenshot.png", "...");

Oversized images are automatically resized to provider limits.

Formatting & Assertion Helpers

import {
  formatCheckResult,
  formatCompareResult,
  assertVisualResult,
  assertVisualCompareResult,
} from "visual-ai-assertions";

// Pretty-print results to console
const result = await ai.check(screenshot, ["Login form is visible"]);
console.log(formatCheckResult(result, "login-page"));

// Throw VisualAIAssertionError on failure (includes full result on error)
assertVisualResult(result, "login-page");

// Same for compare results
const diff = await ai.compare(before, after);
console.log(formatCompareResult(diff));
assertVisualCompareResult(diff, "regression-check");

Error Handling

All errors extend VisualAIError, and every concrete error includes an error.code string for programmatic handling:

import { isVisualAIKnownError } from "visual-ai-assertions";

try {
  const result = await ai.check(screenshot, "Page is loaded");
} catch (error) {
  if (isVisualAIKnownError(error)) {
    switch (error.code) {
      case "AUTH_FAILED":
        // Invalid or missing API key
        break;
      case "RATE_LIMITED":
        // Rate limited — error.retryAfter has seconds to wait
        break;
      case "IMAGE_INVALID":
        // Invalid image: corrupt, unsupported format, etc.
        break;
      case "RESPONSE_PARSE_FAILED":
        // AI returned unparseable response — error.rawResponse has raw text
        break;
      case "CONFIG_INVALID":
        // Provider SDK not installed or invalid config
        break;
      case "ASSERTION_FAILED":
        // assertVisualResult threw — error.result has the full failed result
        break;
      case "PROVIDER_ERROR":
      case "VISUAL_AI_ERROR":
        break;
    }
  }
}

The VisualAIKnownError union and isVisualAIKnownError() helper are useful when you want switch (error.code) to narrow to subclass-specific fields such as retryAfter, statusCode, or rawResponse. Class-based instanceof checks continue to work too.

Environment Variables

API Keys

ProviderEnvironment Variable
AnthropicANTHROPIC_API_KEY
OpenAIOPENAI_API_KEY
GoogleGOOGLE_API_KEY

Optional Configuration

VariableDescription
VISUAL_AI_MODELDefault model when model is not set in config. Overrides the provider's default model.
VISUAL_AI_DEBUGEnable error diagnostic logging to stderr. Does not enable prompt/response logging. Use "true" or "1".
VISUAL_AI_DEBUG_PROMPTEnable prompt-only debug logging to stderr. Use "true" or "1".
VISUAL_AI_DEBUG_RESPONSEEnable response-only debug logging to stderr. Use "true" or "1".
VISUAL_AI_TRACK_USAGEEnable usage tracking (token counts and cost) to stderr. Use "true" or "1".

Configuration

OptionTypeDefaultDescription
apiKeystringenv varAPI key for the provider
modelstringprovider defaultModel to use
debugbooleanfalseEnable error diagnostic logging to stderr
debugPromptbooleanfalseLog prompts to stderr
debugResponsebooleanfalseLog responses to stderr
maxTokensnumber4096Max tokens for AI response
reasoningEffortstringundefined"low" "medium" "high" "xhigh" — controls how deeply the model reasons
trackUsagebooleanfalseLog token usage and estimated cost to stderr

Exported Types

import type {
  AskResult,
  CheckResult,
  CompareResult,
  SupportedMimeType,
  VisualAIConfig,
  VisualAIErrorCode,
} from "visual-ai-assertions";

SupportedMimeType is the exported image MIME union:

type SupportedMimeType = "image/jpeg" | "image/png" | "image/webp" | "image/gif";

Default models:

ProviderDefault Model
Anthropicclaude-sonnet-4-6
OpenAIgpt-5-mini
Googlegemini-3-flash-preview

Reasoning Effort

Control how deeply the model reasons before responding. Higher effort produces more thorough analysis but uses more tokens and takes longer.

const ai = visualAI({
  reasoningEffort: "high", // "low" | "medium" | "high" | "xhigh"
});

When omitted, each provider uses its default behavior. The "xhigh" level enables maximum reasoning depth (maps to Anthropic's "max" effort and OpenAI's "xhigh" via the Responses API).

ProviderNative Parameter"xhigh" maps to
Anthropicthinking.type: "adaptive" + output_config.efforteffort: "max"
OpenAIreasoning.effort (Responses API)effort: "xhigh"
GooglethinkingConfig.thinkingBudget (1024 / 8192 / 24576)24576 (max budget)

Supported Models

All listed models support image/vision input. Pass any model ID to the model config option.

Anthropic

ModelModel IDInput $/MTokOutput $/MTokNotes
Claude Opus 4.6claude-opus-4-6$5$25Most capable, 128K max output
Claude Sonnet 4.6claude-sonnet-4-6$3$15Default — best value
Claude Haiku 4.5claude-haiku-4-5$1$5Fastest, budget-friendly

OpenAI

ModelModel IDInput $/MTokOutput $/MTokNotes
GPT-5.4 Progpt-5.4-pro$30$180Most capable, extended context
GPT-5.4gpt-5.4$2.50$15Best vision quality
GPT-5.2gpt-5.2$1.75$14Balanced quality and cost
GPT-5.4 minigpt-5.4-mini$0.75$4.50Fast and affordable
GPT-5.4 nanogpt-5.4-nano$0.20$1.25Cheapest OpenAI option
GPT-5 minigpt-5-mini$0.25$2Default — fast and cheap

Google

ModelModel IDInput $/MTokOutput $/MTokNotes
Gemini 3.1 Progemini-3.1-pro-preview$2$12Preview — most advanced reasoning
Gemini 3.1 Flash Litegemini-3.1-flash-lite-preview$0.25$1.50Preview — lightweight and cheap
Gemini 3 Flashgemini-3-flash-preview$0.50$3Default — fast and capable

License

MIT

Keywords

visual-testing

FAQs

Package last updated on 19 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts