@@ -138,19 +138,25 @@ #!/usr/bin/env node
		var FindingsSchema = z3.object({
		answer: z3.string().nullable().describe("The answer to the question, or null if the data was not found"),
		success: z3.boolean().describe(
		"True if you found information answering the question (even via alternative paths), false only if data is genuinely unavailable"
		),
		confidence: z3.enum(["high", "medium", "low"]).describe(
		"high = multiple approaches confirmed the answer; medium = found data but not extensively validated; low = uncertain or partial information"
		),
		learnings: z3.array(z3.string()).describe(
		"Query patterns that worked, document types or fields that were useful, syntax that was effective"
		),
		caveats: z3.array(z3.string()).describe(
		"Issues that would trip up a naive agent \u2014 data gaps, external dependencies, field naming gotchas"
		),
		surprises: z3.array(z3.string()).describe(
		"Unexpected findings \u2014 fields that are always null, inconsistent data formats, locale duplicates, etc."
		)
		answer: z3.string().nullable().describe("The answer to the question, or null if not found"),
		success: z3.boolean().describe("True if you found information answering the question"),
		confidence: z3.enum(["high", "medium", "low"]).describe("high = confirmed; medium = found but not validated; low = uncertain"),
		notes: z3.array(z3.string()).describe("Learnings, caveats, and surprises prefixed with category")
		});
		function parseNotes(notes) {
		const learnings = [];
		const caveats = [];
		const surprises = [];
		for (const note of notes) {
		const lower = note.toLowerCase();
		if (lower.startsWith("learning:")) {
		learnings.push(note.slice(9).trim());
		} else if (lower.startsWith("caveat:")) {
		caveats.push(note.slice(7).trim());
		} else if (lower.startsWith("surprise:")) {
		surprises.push(note.slice(9).trim());
		} else {
		learnings.push(note);
		}
		}
		return { learnings, caveats, surprises };
		}
		async function exploreQuestion(mcpClient, context, config) {
		@@ -189,2 +195,3 @@ const { question, tools, priorFindings } = context;
		}
		const { learnings, caveats, surprises } = parseNotes(findings.notes);
		const log = {
		@@ -197,5 +204,5 @@ questionId: question.id ?? "unknown",
		success: findings.success,
		learnings: findings.learnings,
		caveats: findings.caveats,
		surprises: findings.surprises,
		learnings,
		caveats,
		surprises,
		confidence: findings.confidence,
		@@ -254,2 +261,45 @@ rawResponse: result.text

		## GROQ Query Cheat Sheet

		Use this reference for valid GROQ syntax. Do NOT use functions that aren't listed here.

		### Basic Filters
		- All documents of type: \`*[_type=="typeName"]\`
		- With conditions: \`*[_type=="typeName" && field == value]\`
		- Multiple conditions: \`*[_type=="movie" && popularity > 15 && releaseDate > "2024-01-01"]\`
		- Exclude nulls: \`*[_type=="post" && defined(publishedAt)]\`
		- Text matching (case-insensitive): \`*[_type=="post" && title match "searchTerm"]\`
		- Substring search: \`[_type=="post" && lower(name) match lower("magnus*")]\`
		- Array contains: \`*["tag-name" in tags]\`

		### Ordering & Limiting
		- Sort descending: \`\| order(publishedAt desc)\`
		- Sort ascending: \`\| order(publishedAt asc)\`
		- Multiple sort: \`\| order(priority desc, _updatedAt desc)\`
		- First N results: \`[0...5]\` \u2014 returns items 0-4
		- Single result: \`[0]\`
		- Pagination: \`[10...20]\` \u2014 returns items 10-19

		### Projections
		- Select fields: \`{_id, _type, title, publishedAt}\`
		- Rename fields: \`{"displayTitle": title}\`
		- Count arrays: \`{"commentCount": count(comments)}\`

		### References & Joins
		- Dereference single: \`author->name\`
		- Dereference multiple: \`authors[]->name\`
		- Full reference: \`author->{name, bio}\`
		- Incoming references: \`*[_type=="post" && references(^._id)]\`

		### Aggregations
		- Count with filter: \`count(*[_type == "feedback" && rating < 0])\`
		- Count array items: \`{"tagCount": count(tags)}\`

		### NOT Supported in GROQ
		- \`unique()\` \u2014 does not exist
		- \`group()\` \u2014 does not exist
		- \`select()\` as a pipe function \u2014 use filter conditions in \`*[...]\` instead
		- \`order()\` without arguments \u2014 must specify field and direction
		- \`array::unique()\` \u2014 limited support, avoid

		## How to Explore
		@@ -309,13 +359,25 @@

		## What to Capture in Your Findings
		## Submitting Your Findings

		When you're done exploring, the system will capture your structured findings. Make sure you have:
		ALWAYS return JSON in exactly this format with your findings:

		1. answer: The actual answer to the question, or null if not found
		2. success: Did you find information that answers the question?
		3. confidence: How certain are you? (high/medium/low)
		4. learnings: Query patterns that worked, useful fields/types
		5. caveats: What would trip up a naive agent? (data gaps, external deps, naming gotchas)
		6. surprises: Anything unexpected \u2014 fields always null, inconsistent formats, duplicates
		\`\`\`json
		{
		"answer": "The Trailblazer jacket is insulated with 600-fill down...",
		"success": true,
		"confidence": "high",
		"notes": [
		"learning: The product specs are in the productFeature documents, not the product itself",
		"caveat: Color fields are always null \u2014 data lives in external SFCC system",
		"surprise: 254 locale variants per product causes duplicate results without lang filter"
		]
		}
		\`\`\`

		Prefix each note with its category: \`learning:\`, \`caveat:\`, or \`surprise:\`

		- learning: Query patterns that worked, useful fields/types
		- caveat: What would trip up a naive agent? (data gaps, external deps, naming gotchas)
		- surprise: Anything unexpected \u2014 fields always null, inconsistent formats, duplicates

		Caveats and surprises are the most valuable output \u2014 they're what prevent future agents from making mistakes.`;
		@@ -387,2 +449,4 @@ }
		url: new URL(config.url),
		connectTimeout: 1e4,
		// 10s for initial connection (default is 3s, too aggressive for cold starts)
		requestInit: {
		@@ -392,3 +456,5 @@ headers: config.auth.type === "bearer" ? { Authorization: `Bearer ${config.auth.token}` } : { "X-API-Key": config.auth.token }
		} : {
		url: new URL(config.url)
		url: new URL(config.url),
		connectTimeout: 1e4
		// 10s for initial connection (default is 3s, too aggressive for cold starts)
		};
		@@ -474,6 +540,7 @@ const client = new MCPClient({
		var REQUIRED_SECTIONS = [
		"## 1. What This Dataset Contains",
		"## 2. How to Query",
		"## 3. Traps",
		"## 4. Trust Boundaries"
		"## Schema Reference",
		"## Query Patterns",
		"## Critical Rules",
		"## Known Limitations",
		"## Exploration Coverage"
		];
		@@ -488,3 +555,3 @@ function validateSynthesisStructure(text) {
		const userPrompt = buildExplorationSummary(logs, metrics, tools);
		console.log("\nSynthesizing knowledge from exploration logs...");
		console.log("\nSynthesizing instructions from exploration logs...");
		console.log(` Questions: ${logs.length}`);
		@@ -511,3 +578,3 @@ console.log(` Successful: ${metrics.successfulQuestions}`);

		IMPORTANT: Your output MUST include ALL four required sections:
		IMPORTANT: Your output MUST include ALL five required sections:
		${REQUIRED_SECTIONS.map((s) => `- ${s}`).join("\n")}
		@@ -529,3 +596,3 @@
		function buildSynthesisPrompt() {
		return `You are a knowledge synthesis agent. Your job is to take raw exploration logs from an MCP (Model Context Protocol) server and produce a unified knowledge document that future AI agents can use to query this dataset effectively.
		return `You are a knowledge synthesis agent. Your job is to take raw exploration logs from an MCP (Model Context Protocol) server and produce LLM-ready instructions that can be pasted directly into a production agent's system prompt.

		@@ -538,100 +605,171 @@ ## Dataset Schema

		Produce a single markdown document with these four sections:
		Produce a single markdown document. Do NOT wrap it in a code fence. Use these exact five section headings (## level):

		### 1. What This Dataset Contains
		\`\`\`
		## Schema Reference
		## Query Patterns
		## Critical Rules
		## Known Limitations
		## Exploration Coverage
		\`\`\`

		A concise inventory of the dataset \u2014 what's here, what's reliable, and what's NOT here.
		You may add additional ## sections when findings naturally cluster, but these five must always be present with these exact headings.

		Document types: Flat list of each document type and its purpose (one line each).
		Here is a structural skeleton showing the expected format:

		Reliable fields per type: For each document type, list the fields an agent can confidently use. Include a confidence indicator:
		- [High] \u2014 Confirmed across multiple explorations or validated against expected answers
		- [Medium] \u2014 Observed in exploration but not extensively validated
		- [Low] \u2014 Single observation or inference, may not generalize
		\`\`\`
		## Schema Reference

		What is NOT in this dataset: A single merged list covering both data gaps and external system dependencies. For each item, explain where the data actually lives (if known) and what happens if an agent tries to query it here.
		\| Document Type \| Use For \| Key Fields \|
		\|---------------\|---------\|------------\|
		\| [type] \| [purpose] \| [fields] \|

		Format external dependencies as:
		\| Field(s) \| Where It Actually Lives \| What Happens If You Query Here \|
		\|----------\|------------------------\|-------------------------------\|
		## Query Patterns

		This section answers: "Before I query, is the answer even in this dataset?"
		### [Use Case Name]
		When to use: [description]

		### 2. How to Query (by Question Category)
		[query code block]

		The core reference section. Organized by question category (e.g., product-specs, compatibility, comparison, pricing, etc.). For each category:
		Important: [notes]

		Category: [name]
		## Critical Rules

		- When to use: What kind of user question maps to this category
		- Recommended query pattern: The effective approach with a concrete example
		- Expected result shape: What the response looks like
		- What to avoid: Common mistakes or anti-patterns for THIS category, with explanation \u2014 inline, not in a separate table
		- Known failures: Specific queries or scenarios that fail for THIS category, with explanation \u2014 inline, not in a separate section
		- Always [do X] when [condition] \u2014 [why]
		- Never [do Y] \u2014 [what goes wrong]

		Each category should be self-contained: an agent looking up how to answer a product-specs question should find the pattern, the pitfalls, AND the known failures all in one place.
		## Known Limitations

		If a category had no successful queries during exploration, still include it but note that no working pattern was found and describe what was attempted.
		- [What's missing] \u2014 lives in [system]. Querying here returns [result]. [confidence]

		### 3. Traps (Global Hazards)
		## Exploration Coverage

		ONLY cross-cutting issues that affect multiple categories and don't belong to any single one. This section should be short. If a hazard is specific to one query category, it belongs in Section 2 instead.
		Validated areas: [list]
		Confidence: [level] \u2014 [stats]
		Not explored: [blind spots]
		\`\`\`

		Examples of what belongs here:
		- Field naming conventions (e.g., "always use \`name\` not \`title\` \u2014 \`title\` is null across all types")
		- Query engine quirks (e.g., "results capped at N", "no fuzzy matching support")
		- Locale/language filtering requirements that apply globally
		- Naming ambiguities where user terms map to multiple document types
		Below are detailed instructions for each section.

		Format:
		\| Trap \| Impact \| Resolution \|
		\|------\|--------\|------------\|
		\| [issue] \| [what goes wrong] \| [how to handle it] \|
		---

		### 4. Trust Boundaries
		Section: ## Schema Reference

		What was explored, what wasn't, and how much to trust this document.
		A quick-reference table of document types discovered during exploration.

		Explored and validated: List each category with its confidence level ([High]/[Medium]/[Low]) based on exploration success rate and answer validation results.
		\| Document Type \| Use For \| Key Fields \|
		\|---------------\|---------\|------------\|
		\| [type] \| [what questions this answers] \| [fields that work reliably] \|

		NOT explored: Explicit list of categories, document types, or query patterns that were never tested. These are blind spots \u2014 an agent operating here has no guidance.
		Keep it flat and scannable. Only include types that were actually encountered during exploration.

		Overall reliability assessment: A brief statement on how much an agent should trust this document, factoring in exploration breadth and validation results.
		---

		\`\`\`
		Questions explored: N
		Categories validated: [list with confidence]
		Blind spots: [list]
		Overall confidence: [High/Medium/Low]
		\`\`\`
		Section: ## Query Patterns

		## Confidence System
		The core reference section. Organize by use case (what the user is trying to do), not by abstract category names.

		Use a single confidence scale everywhere in the document:
		- [High] \u2014 Confirmed across multiple explorations or validated against expected answers
		- [Medium] \u2014 Observed in exploration but not extensively validated
		- [Low] \u2014 Single observation or inference, may not generalize
		For each pattern, use a ### sub-heading:

		Do NOT use a separate severity system (CRITICAL/HIGH/MEDIUM/LOW). Confidence is the only rating.
		### [Use Case Name]
		When to use: [What kind of user question maps here]

		## Key Principles
		Then a code block with the actual query. Use the appropriate code fence language identifier if the query language is clear from the exploration (e.g., groq, sql, graphql). If unclear, use a plain code block.

		1. Co-locate hazards with context. Category-specific traps belong in Section 2 next to the query pattern they affect. An agent should never have to cross-reference sections to handle a single query type.
		Important: [Any critical notes \u2014 required filters, fields to dereference, common mistakes to avoid]

		2. Failures are more valuable than successes. A naive agent can figure out what works through trial-and-error. What they CAN'T figure out is why something that looks right doesn't work, or why data that should exist is missing.
		Include one pattern per distinct use case discovered during exploration. The number will vary based on exploration breadth \u2014 let the data drive the count. Each pattern should be immediately usable \u2014 no placeholders, no "replace X with your value" instructions.

		3. Be specific over generic. "Era 100 color data is in media alt text" beats "some fields may be stored non-intuitively."
		If a query category had no successful pattern, include it with a note: "No working pattern found. Attempted: [what was tried]. Consider: [fallback or alternative]."

		4. Deduplicate findings. Multiple explorations may discover the same caveats \u2014 consolidate them into single entries with appropriate confidence levels.
		---

		5. Preserve surprising discoveries. If an exploration noted something unexpected (in the "surprises" field), it's likely important.
		Section: ## Critical Rules

		6. Confidence from validation. If an expected answer was provided and matched, that's [High] confidence. Partial matches are [Medium]. No matches or failed explorations are [Low].
		Imperative statements an agent must follow. Use "Always..." and "Never..." language.

		## What NOT to include
		Format as a bullet list:
		- Always [do X] when [condition] \u2014 [why]
		- Never [do Y] \u2014 [what goes wrong]

		- Raw tool call logs or JSON dumps
		- Generic GROQ syntax tutorials (assume the reader knows GROQ)
		- Findings that weren't actually validated through exploration
		- Speculation beyond what the data shows`;
		These are cross-cutting rules that apply regardless of query type. Category-specific rules belong inline in Query Patterns.

		Derive rules from:
		- Repeated failures across explorations
		- Surprising discoveries
		- Fields that look correct but don't work
		- Required filters (locale, status, etc.)

		---

		Section: ## Known Limitations

		What data is NOT available through this MCP. This section prevents agents from wasting time querying for data that doesn't exist.

		For each limitation, always state all three parts:
		1. What's missing \u2014 the field, data type, or query that doesn't work
		2. Where it lives instead \u2014 the external system or alternative source, if discovered during exploration. If unknown, say "source unknown"
		3. What happens if you query it \u2014 the actual result (null, empty array, error, etc.)

		Format as a bullet list:
		- [What's missing] \u2014 lives in [external system or "source unknown"]. Querying here returns [actual result]. [High/Medium/Low confidence]

		Include:
		- Fields that are always null or empty
		- Data that lives in external systems
		- Query patterns that consistently fail
		- Document types that exist but contain no useful data

		---

		Section: ## Exploration Coverage

		Brief summary of what was tested and overall confidence.

		Validated areas: [comma-separated list of what was explored]

		Confidence: [High/Medium/Low] \u2014 [N] questions explored with [X]% success rate across [Y] use cases

		Not explored: [what wasn't tested \u2014 these are blind spots]

		---

		## Writing Style

		Imperative, not descriptive. Write instructions an agent can follow, not documentation a human reads.

		\| Instead of... \| Write... \|
		\|---------------\|----------\|
		\| "The title field contains the document title" \| "Use \`title\` for document titles" \|
		\| "Queries may return multiple locale variants" \| "Always filter by locale to avoid duplicates" \|
		\| "The field was not found to contain data" \| "Never query \`fieldName\` \u2014 always null" \|

		Specific over generic. Use actual field names, document types, and query patterns from the exploration. Don't generalize into abstract advice.

		Concise. Each rule or pattern should be 1-2 lines. Agents don't need explanations \u2014 they need instructions.

		## Confidence Indicators

		Attach confidence to statements in Known Limitations:
		- [High] \u2014 Confirmed across multiple explorations or validated against expected answers
		- [Medium] \u2014 Observed but not extensively validated
		- [Low] \u2014 Single observation, may not generalize

		## Key Principles

		1. Failures are more valuable than successes. A naive agent can figure out what works through trial-and-error. Document what DOESN'T work and why.

		2. Inline the hazards. Put warnings next to the patterns they affect. An agent shouldn't have to cross-reference sections.

		3. Deduplicate findings. Multiple explorations may discover the same issue \u2014 consolidate into single statements.

		4. Preserve surprises. If an exploration noted something unexpected, it's probably important.

		## What NOT to Include

		- Generic query syntax tutorials (assume the reader knows the query language)
		- Placeholder examples with [YOUR_VALUE_HERE]
		- Raw exploration logs or JSON dumps
		- Speculation beyond what the data shows
		- Explanatory prose \u2014 just give the rules`;
		}
		@@ -818,3 +956,3 @@ function buildExplorationSummary(logs, metrics, tools) {
		await mcpClient.disconnect();
		console.log("\n[Synthesis] Generating knowledge document...");
		console.log("\n[Synthesis] Generating exploration results...");
		let datasetKnowledge;
		@@ -916,116 +1054,5 @@ try {
		await writeFile(metricsPath, JSON.stringify(result.metrics, null, 2));
		const knowledgePath = join(outputDir, "dataset-knowledge.md");
		await writeFile(knowledgePath, result.datasetKnowledge);
		const summaryPath = join(outputDir, "exploration-summary.md");
		await writeFile(summaryPath, generateExplorationSummary(result.explorationLogs, result.metrics));
		const resultsPath = join(outputDir, "exploration-results.md");
		await writeFile(resultsPath, result.datasetKnowledge);
		}
		function generateExplorationSummary(logs, metrics) {
		const lines = [
		"# Exploration Summary",
		"",
		`Generated: ${(/* @__PURE__ */ new Date()).toISOString()}`,
		"",
		"## Overview",
		"",
		`\| Metric \| Value \|`,
		`\|--------\|-------\|`,
		`\| Total Questions \| ${metrics.totalQuestions} \|`,
		`\| Successful \| ${metrics.successfulQuestions} (${Math.round(metrics.successfulQuestions / metrics.totalQuestions * 100)}%) \|`,
		`\| Failed \| ${metrics.failedQuestions} \|`,
		`\| Total Tool Calls \| ${metrics.totalToolCalls} \|`,
		`\| Duration \| ${(metrics.totalDuration / 1e3).toFixed(1)}s \|`,
		"",
		"## Confidence Distribution",
		"",
		`\| Level \| Count \|`,
		`\|-------\|-------\|`,
		`\| High \| ${metrics.confidenceDistribution.high} \|`,
		`\| Medium \| ${metrics.confidenceDistribution.medium} \|`,
		`\| Low \| ${metrics.confidenceDistribution.low} \|`,
		"",
		"## Category Coverage",
		"",
		`\| Category \| Total \| Successful \| Rate \|`,
		`\|----------\|-------\|------------\|------\|`
		];
		for (const [category, stats] of Object.entries(metrics.categoryCoverage)) {
		const rate = Math.round(stats.successful / stats.total * 100);
		lines.push(`\| ${category} \| ${stats.total} \| ${stats.successful} \| ${rate}% \|`);
		}
		if (metrics.validationMetrics.questionsWithExpected > 0) {
		const vm = metrics.validationMetrics;
		lines.push("## Answer Validation");
		lines.push("");
		lines.push("\| Match Type \| Count \| Rate \|");
		lines.push("\|------------\|-------\|------\|");
		lines.push(
		`\| Full Match \| ${vm.fullMatches} \| ${Math.round(vm.fullMatches / vm.questionsWithExpected * 100)}% \|`
		);
		lines.push(
		`\| Partial Match \| ${vm.partialMatches} \| ${Math.round(vm.partialMatches / vm.questionsWithExpected * 100)}% \|`
		);
		lines.push(
		`\| No Match \| ${vm.noMatches} \| ${Math.round(vm.noMatches / vm.questionsWithExpected * 100)}% \|`
		);
		lines.push(
		`\| Gap Identified \| ${vm.gapIdentified} \| ${Math.round(vm.gapIdentified / vm.questionsWithExpected * 100)}% \|`
		);
		lines.push("");
		lines.push(`Based on ${vm.questionsWithExpected} questions with expected answers`);
		lines.push("");
		}
		lines.push("## Question Results", "");
		for (const log of logs) {
		const icon = log.success ? "\u2705" : "\u274C";
		lines.push(`### ${icon} ${log.questionId}: ${log.question}`);
		lines.push("");
		if (log.category) {
		lines.push(`- Category: ${log.category}`);
		}
		lines.push(`- Success: ${log.success}`);
		lines.push(`- Confidence: ${log.confidence}`);
		lines.push(`- Attempts: ${log.attempts.length}`);
		lines.push("");
		if (log.finalAnswer) {
		lines.push(`Answer: ${log.finalAnswer}`);
		lines.push("");
		}
		if (log.validation) {
		const matchIcons = {
		full: "\u2705",
		partial: "\u26A0\uFE0F",
		gap_identified: "\u{1F50D}",
		none: "\u274C",
		skipped: "\u23ED\uFE0F"
		};
		const matchIcon = matchIcons[log.validation.match] ?? "\u2753";
		lines.push(
		`Validation: ${matchIcon} ${log.validation.match} (expected: "${log.validation.expectedAnswer}")`
		);
		lines.push("");
		}
		if (log.learnings.length > 0) {
		lines.push("Learnings:");
		for (const learning of log.learnings) {
		lines.push(`- ${learning}`);
		}
		lines.push("");
		}
		if (log.caveats.length > 0) {
		lines.push("Caveats:");
		for (const caveat of log.caveats) {
		lines.push(`- ${caveat}`);
		}
		lines.push("");
		}
		if (log.surprises.length > 0) {
		lines.push("Surprises:");
		for (const surprise of log.surprises) {
		lines.push(`- ${surprise}`);
		}
		lines.push("");
		}
		}
		return lines.join("\n");
		}

		@@ -1255,6 +1282,4 @@ // src/question-loader.ts
		Next steps:`);
		console.log(` 1. Review ${config.outputDir}/dataset-knowledge.md`);
		console.log(
		` 2. Use the optimize-agent-prompt skill to integrate findings into your agent's system prompt`
		);
		console.log(` 1. Review ${config.outputDir}/exploration-results.md`);
		console.log(` 2. Copy the contents into your Agent Context Document's instructions field`);
		}
		@@ -1261,0 +1286,0 @@ main().catch((error) => {

+1

-1

package.json

		{
		"name": "@sanity/agent-context-explorer",
		"version": "0.0.2",
		"version": "0.0.3",
		"description": "Exploration tool for Sanity Agent Context — produces knowledge docs for production AI agents",
		@@ -5,0 +5,0 @@ "author": "Sanity.io <hello@sanity.io>",

+82

-40

README.md

		# Agent Context Explorer

		Companion tool for [Sanity Agent Context](https://github.com/sanity-io/agent-context). Explores your Agent Context server, documents what works and what doesn't, and produces knowledge docs (`dataset-knowledge.md`) that help production agents operate effectively from day one.
		Companion tool for [Sanity Agent Context](https://github.com/sanity-io/agent-context). Explores your Agent Context server, documents what works and what doesn't, and produces `exploration-results.md` — ready to copy directly into your Agent Context Document.

		@@ -55,3 +55,3 @@ ## Why This Tool?

		3. Review the generated `dataset-knowledge.md` in your timestamped output directory (e.g., `./explorer-output-2026-02-11T09-22-30/`).
		3. Copy the contents of `exploration-results.md` into your Agent Context Document's `instructions` field. The output directory will be timestamped (e.g., `./explorer-output-2026-02-11T09-22-30/`).

		@@ -89,19 +89,16 @@ ## Question File Format

		### `dataset-knowledge.md` (Primary Output)
		### `exploration-results.md` (Primary Output)

		A structured knowledge document with four sections:
		This is the file you copy into your Agent Context Document. It contains LLM-ready instructions with five sections:

		1. What This Dataset Contains — Document types, reliable fields per type, and what is NOT in the dataset (data gaps, external dependencies)
		2. How to Query (by Category) — Self-contained query patterns per question category, with inline hazards and known failures
		3. Traps (Global Hazards) — Cross-cutting issues that affect multiple categories (field naming, query engine quirks, locale filtering)
		4. Trust Boundaries — What was explored vs. unknown territory, with confidence levels
		1. Schema Reference — Document types, what they're used for, and key fields
		2. Query Patterns — Working query examples organized by use case, with inline warnings
		3. Critical Rules — "Always do X" and "Never do Y" statements derived from exploration
		4. Known Limitations — What data is NOT available (null fields, external dependencies)
		5. Exploration Coverage — What was tested and confidence levels

		Failures are the most valuable output — they document everything that would trip up a naive agent, preventing wrong answers and wasted queries.
		The synthesis agent may add additional sections when findings naturally cluster (e.g., "Locale Handling" if locale issues were prominent).

		This is the file you give to production agents.
		Failures are the most valuable output — they document what would trip up a naive agent, preventing wrong answers and wasted queries.

		### `exploration-summary.md`

		Human-readable summary of exploration results, useful for debugging.

		### `logs/*.json`
		@@ -116,4 +113,24 @@

		Aggregated statistics: success rates, confidence distribution, category coverage.
		Aggregated statistics: success rates, confidence distribution, category coverage, and validation results.

		## Answer Validation

		When you provide `expected_answer` in your questions, the explorer validates the agent's answer against yours using an LLM comparison. This appears in the CLI output as:

		```
		[1/12] ✓ Success (high confidence)
		Validation: full
		```

		Match levels:

		\| Match \| Meaning \|
		\|-------\|---------\|
		\| `full` \| Agent's answer conveys the same information as expected (even if worded differently) \|
		\| `partial` \| Answer contains some expected information but is missing parts \|
		\| `none` \| Answer is different or contradictory \|
		\| `gap_identified` \| Agent correctly determined the data doesn't exist in this dataset \|

		Validation results flow into the final `exploration-results.md` — full matches produce [High] confidence patterns, partial matches produce [Medium], and failures are documented in the Known Limitations section.

		## CLI Options
		@@ -143,39 +160,64 @@

		Here's a snippet from a generated knowledge document:
		Here's a snippet from a generated `exploration-results.md`:

		```markdown
		## 1. What This Dataset Contains
		## Schema Reference

		### What is NOT in this dataset
		\| Field(s) \| Where It Actually Lives \| What Happens If You Query Here \|
		\|----------\|------------------------\|-------------------------------\|
		\| inventory, stockLevel \| Shopify \| Returns null — stock managed externally \|
		\| pricing \| Commerce API \| Returns null — prices not in this dataset \|
		\| Document Type \| Use For \| Key Fields \|
		\|---------------\|---------\|------------\|
		\| product \| Product info, specs \| name, description, specs, variants \|
		\| category \| Product categorization \| title, slug, products[] \|
		\| support-article \| Help content \| title, body, relatedProducts[] \|

		## 2. How to Query (by Category)
		## Query Patterns

		### Category: product-specs
		- When to use: User asks about product features, dimensions, or specifications
		- Recommended pattern: `*[_type == "product" && name match "Trailblazer"][0]{name, specs, variants}`
		- What to avoid: Do NOT use `title` — products use `name` (`title` is null). Do NOT query `inventory` — it is always null; stock is managed in Shopify.
		### Product Details
		When to use: User asks about a specific product's features or specifications

		## 3. Traps (Global Hazards)
		\| Trap \| Impact \| Resolution \|
		\|------\|--------\|------------\|
		\| `title` is null on all products \| Queries return empty results \| Use `name` field instead \|
		\| Locale duplicates \| Results contain de, fr, etc. variants \| Filter by `lang == "en-us"` \|
		```
		*[_type == "product" && name match $productName][0]{
		name, description, specs, variants
		}

		## Agent Skills
		Important: Always use `name` not `title` — the `title` field is null on products.

		After running the explorer, use the optimize-agent-prompt skill to integrate `dataset-knowledge.md` into your production agent's system prompt. The skill lives in the [sanity-io/agent-context](https://github.com/sanity-io/agent-context) repository alongside the build skill:
		### Product Comparison
		When to use: User wants to compare two or more products

		```bash
		npx skills add sanity-io/agent-context
		*[_type == "product" && name in $productNames]{
		name, specs, variants
		}

		Important: Filter results by locale if your dataset has multiple language variants.

		## Critical Rules

		- Always use `name` for product lookups — `title` is null on all product documents
		- Always filter by locale when querying products to avoid duplicate results
		- Never query `inventory` or `stockLevel` — these fields are always null (managed in external system)

		## Known Limitations

		- Inventory and stock data lives in Shopify, not this dataset [High confidence]
		- Pricing data lives in Commerce API [High confidence]
		- The `title` field on products is always null — use `name` instead [High confidence]

		## Exploration Coverage

		Validated areas: product specs, product comparison, category browsing, support content

		Confidence: High — 12 questions explored with 92% success rate

		Not explored: user reviews, order history, real-time inventory
		```

		This installs two [Vercel Agent Skills](https://vercel.com/docs/ai/agent-skills):
		- create-agent-with-sanity-context — Build an agent from scratch (Studio setup, agent implementation, conversation classification)
		- optimize-agent-prompt — Integrate `dataset-knowledge.md` findings into a production-quality system prompt
		## Using the Output

		After running the explorer:

		1. Open `exploration-results.md` in your output directory
		2. Review the generated instructions — adjust if needed for your specific use case
		3. Copy the entire contents into your Agent Context Document's `instructions` field

		This gives your agent dataset-specific knowledge from day one.

		## Requirements
		@@ -182,0 +224,0 @@

dist/cli.js.map

Sorry, the diff of this file is too big to display

@sanity/agent-context-explorer - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics