anymodel - npm Package Compare versions

Comparing version

1.15.0

1.16.0

+29

-8

LOCAL_SETUP.md

		@@ -225,2 +225,5 @@ # Running Claude Code locally through AnyModel → LMStudio
		\| `LOCAL_SKILL_DESC_CHARS` \| `140` \| Max chars per skill description in the index \|
		\| `LOCAL_PROJECT_DIR` \| cwd \| Where the proxy reads `.claude/skills/` for project scope \|
		\| `LOCAL_SKILL_SCOPE` \| derived \| `project` \\| `all` — override scope independent of tier \|
		\| `LOCAL_SKILL_ALWAYS` \| sw:* essentials \| Comma list of skills always kept in project scope \|

		@@ -240,13 +243,31 @@ ## Skill auto-trigger on local models (`--local-fidelity`)

		\| Tier \| What it re-injects \| Cold turn-1 TTFT \| When to use \|
		\| Tier \| Skill scope \| Index size \| When to use \|
		\|---\|---\|---\|---\|
		\| `lean` \| nothing (current pre-0010 behavior) \| no change \| latency purists; you don't use skills locally \|
		\| `balanced` (default) \| curated behavioral core (~700 tok) + clipped skill index (≤~1000 tok, `whenToUse` dropped) \| +~0.7-1.3 s, then ~0 ms (KV reuse) \| the daily driver — skills auto-trigger \|
		\| `full` \| richer index (keeps `whenToUse`, higher clamp) + fuller rules \| +~1.7-3.3 s \| 131 K ctx or the 80B model \|
		\| `lean` \| — (nothing) \| 0 \| latency purists; you don't use skills locally \|
		\| `balanced` (default) \| *project `.claude/skills` + sw: workflow-core** \| ~150-500 tok \| the daily driver — relevant skills, small + cacheable \|
		\| `full` \| whole harvested catalog \| ~1-3 K tok \| when you want every global/plugin skill available \|

		Measured on M4 Max / qwen3-coder-30b MLX: `lean` triggers skills 0/12 of the time
		(catalog stripped); `balanced` triggers 9/12 (75%) with valid skill names. Run the
		harness yourself: `node test/skill-trigger-eval.mjs` (needs a running proxy). MCP
		suppression is unaffected by this flag — use `--full-mcp` for that.
		Scope matters more than budget (0016). Measured on a realistic local request (100
		skills, 90 tools, 6.7 KB system) on qwen3-coder-30b MLX, the prefix breaks down as
		tool schemas 7,757 tok (79%), system 917 tok (9%), skill index 1,147 tok (12%). So
		the skill index is the small cost — and `full` injects ~30 mostly-irrelevant global
		skills. `balanced` scopes the index to **your project's own skills + the SpecWeave
		workflow** (read from `LOCAL_PROJECT_DIR`, default cwd), dropping the rest: in the bench,
		2,812 tok (full) → 147 tok (project scope). It stays query-independent on purpose,
		so it lives in the cacheable prefix instead of busting the KV cache every turn.

		To get your project's skills, start the proxy from the project dir or pass the dir:
		```bash
		LOCAL_PROJECT_DIR=~/Projects/wc26 anymodel proxy lmstudio # balanced = wc26 skills + sw:*
		anymodel proxy lmstudio --local-fidelity full # or: every skill, bigger prefix
		```

		The real lever is tools, not skills. If local is still slow, the 90 tool schemas
		(7.7 K tok even after 68 % compression) dominate — cut them with `LOCAL_MAX_TOOLS` or a
		lower `LOCAL_TOOL_BUDGET_PCT` (default `0.30`). MCP suppression is separate — `--full-mcp`.

		Skill triggering on qwen3-coder-30b is real but variable (~50-75 % per run at default
		temperature). Run the harness yourself: `node test/skill-trigger-eval.mjs` (needs a
		running proxy; `full` tier to exercise the whole catalog).

		## The full three-command reference
		@@ -253,0 +274,0 @@

+1

-1

package.json

		{
		"name": "anymodel",
		"version": "1.15.0",
		"version": "1.16.0",
		"description": "Universal AI model proxy — route any coding tool through OpenRouter, Ollama, LMStudio, llama.cpp, or any LLM provider",
		@@ -5,0 +5,0 @@ "type": "module",

+5

-2

providers/openai-local.mjs

		@@ -14,3 +14,3 @@ // Factory for "local OpenAI-compatible" providers (LMStudio, llama-server, any
		import https from 'https';
		import { translateRequest, translateResponse, createStreamTranslator } from './openai.mjs';
		import { translateRequest, translateResponse, createStreamTranslator, isVisionModel } from './openai.mjs';

		@@ -53,3 +53,6 @@ export function makeOpenAILocalProvider({

		transformRequest: translateRequest,
		// US-003: gate image forwarding on the model's vision capability so a screenshot
		// sent to a non-vision local coding model degrades to a descriptive marker instead
		// of an image_url the model silently ignores (LOCAL_VISION overrides the heuristic).
		transformRequest: (body) => translateRequest(body, { visionCapable: isVisionModel(body?.model) }),
		// P0.2: mark responses as coming from a LOCAL provider so text-channel
		@@ -56,0 +59,0 @@ // tool-call recovery engages under ANYMODEL_PARSE_TEXT_TOOLCALLS=auto.

+41

-11

providers/openai.mjs

		@@ -21,2 +21,32 @@ // OpenAI provider for anymodel

		// US-003: a descriptive placeholder for an image we are NOT forwarding (non-vision
		// backend or unresolvable source). Includes the decoded byte size + mime when known
		// so a blind model still knows a screenshot was produced, instead of a bare marker
		// or a silent ''.
		export function imageMarker(b) {
		const src = b?.source;
		const mime = (src && typeof src === 'object') ? src.media_type : undefined;
		let bytes;
		if (src && src.type === 'base64' && typeof src.data === 'string') {
		bytes = Math.floor((src.data.length * 3) / 4); // base64 → approx decoded bytes
		} else if (src && src.type === 'url' && typeof src.url === 'string') {
		return mime ? `[image omitted: ${mime}, ${src.url}]` : `[image omitted: ${src.url}]`;
		}
		if (bytes != null && mime) return `[image omitted: ${bytes} bytes, ${mime}]`;
		if (mime) return `[image omitted: ${mime}]`;
		return '[image omitted]';
		}

		// US-003: decide whether the target model can consume image parts. `LOCAL_VISION`
		// (on\|off\|auto, default auto) overrides; auto matches known multimodal model-name
		// fragments. Coding models (qwen3-coder, deepseek-coder, …) are non-vision → images
		// become a descriptive marker rather than an image_url the model silently ignores.
		export function isVisionModel(model) {
		const mode = (process.env.LOCAL_VISION \|\| 'auto').toLowerCase();
		if (mode === 'on') return true;
		if (mode === 'off') return false;
		const m = String(model \|\| '').toLowerCase();
		return /(?:^\|[-_/])(?:vl\|vision\|llava\|pixtral\|moondream\|internvl)\|minicpm-?v\|gemma-?[34]\|llama-?3\.2-vision\|qwen[\w.-]*-vl/.test(m);
		}

		// P1.2: translate an Anthropic content-block array into OpenAI message content.
		@@ -28,3 +58,3 @@ // Returns a plain STRING when every block is text (keeps text-only turns
		// drop.
		export function blocksToOpenAIContent(blocks) {
		export function blocksToOpenAIContent(blocks, { visionCapable = true } = {}) {
		const parts = [];
		@@ -37,5 +67,5 @@ let hasImage = false;
		if (b.type === 'image') {
		const url = imageBlockToUrl(b);
		const url = visionCapable ? imageBlockToUrl(b) : null;
		if (url) { parts.push({ type: 'image_url', image_url: { url } }); hasImage = true; }
		else parts.push({ type: 'text', text: '[image omitted]' });
		else parts.push({ type: 'text', text: imageMarker(b) });
		continue;
		@@ -55,3 +85,3 @@ }
		// the model (OpenAI's tool role has no structured error field).
		export function extractToolResultParts(block) {
		export function extractToolResultParts(block, { visionCapable = true } = {}) {
		const imageUrls = [];
		@@ -66,5 +96,5 @@ let text;
		else if (b?.type === 'image') {
		const url = imageBlockToUrl(b);
		const url = visionCapable ? imageBlockToUrl(b) : null;
		if (url) imageUrls.push(url);
		else pieces.push('[image omitted]');
		else pieces.push(imageMarker(b));
		} else if (b?.type === 'document') pieces.push('[document omitted]');
		@@ -81,3 +111,3 @@ else if (typeof b?.text === 'string') pieces.push(b.text);

		export function translateRequest(anthropicBody) {
		export function translateRequest(anthropicBody, { visionCapable = true } = {}) {
		const openaiBody = {
		@@ -123,3 +153,3 @@ model: anthropicBody.model,
		// P1.3: preserve is_error marker; P1.2: hoist images (tool role is text-only)
		const { text, imageUrls } = extractToolResultParts(block);
		const { text, imageUrls } = extractToolResultParts(block, { visionCapable });
		openaiBody.messages.push({
		@@ -140,6 +170,6 @@ role: 'tool',
		// P1.2: a bare image alongside tool_result blocks — emit as its own user turn
		const url = imageBlockToUrl(block);
		const url = visionCapable ? imageBlockToUrl(block) : null;
		openaiBody.messages.push({
		role: 'user',
		content: url ? [{ type: 'image_url', image_url: { url } }] : '[image omitted]',
		content: url ? [{ type: 'image_url', image_url: { url } }] : imageMarker(block),
		});
		@@ -150,3 +180,3 @@ }
		// Regular user message with content blocks (P1.2: images → vision parts)
		openaiBody.messages.push({ role: 'user', content: blocksToOpenAIContent(msg.content) });
		openaiBody.messages.push({ role: 'user', content: blocksToOpenAIContent(msg.content, { visionCapable }) });
		}
		@@ -153,0 +183,0 @@ } else {

+157

-13

providers/skill-catalog.mjs

		@@ -10,3 +10,10 @@ // providers/skill-catalog.mjs — increment 0010 (local skill-fidelity).
		// byte-stable for prefix-cache (KV) reuse.
		//
		// Increment 0016 adds project-SCOPING: on local providers the default index is
		// restricted to the project's own .claude/skills + a workflow-core, keeping it small
		// and query-independent (cacheable) instead of injecting ~30 irrelevant global skills.

		import { existsSync, readdirSync, statSync } from 'node:fs';
		import { join } from 'node:path';

		const CATALOG_HEADER = 'The following skills are available for use with the Skill tool:';
		@@ -21,2 +28,105 @@

		// Curated SpecWeave workflow essentials always kept in project scope (small + stable).
		// Overridable via the caller's alwaysInclude (LOCAL_SKILL_ALWAYS). Keeps the local index
		// relevant + cacheable instead of injecting ~30 irrelevant global skills every turn (0016).
		export const WORKFLOW_CORE = [
		'sw:increment', 'sw:do', 'sw:done', 'sw:pm', 'sw:architect',
		'sw:grill', 'sw:validate', 'sw:progress', 'sw:brainstorm', 'sw:code-reviewer',
		];

		const _projectSkillMemo = new Map(); // dir -> string[]

		/**
		* Names of project-local skills under `<dir>/.claude/skills/<name>/SKILL.md`. Memoized
		* per dir (one fs read per project dir, not per request). Best-effort: returns [] on a
		* missing/unreadable dir, never throws. (0016)
		*/
		export function readProjectSkillNames(dir) {
		if (!dir) return [];
		if (_projectSkillMemo.has(dir)) return _projectSkillMemo.get(dir);
		let names = [];
		try {
		const skillsDir = join(dir, '.claude', 'skills');
		if (existsSync(skillsDir)) {
		names = readdirSync(skillsDir).filter(n => {
		try { return statSync(join(skillsDir, n)).isDirectory() && existsSync(join(skillsDir, n, 'SKILL.md')); }
		catch { return false; }
		});
		}
		} catch { names = []; }
		_projectSkillMemo.set(dir, names);
		return names;
		}

		/** Test hook: clear the project-skill memo. */
		export function _resetProjectSkillMemo() { _projectSkillMemo.clear(); }

		// ── Per-session skill-catalog cache (0013 / US-001) ──
		// Claude Code injects the catalog <system-reminder> on the FIRST turn; later turns can
		// arrive without it. We cache the harvested catalog keyed by a stable session signature
		// (opening user prompt with reminder-tags stripped + the tool-name set) and re-inject on
		// turn 2+ so skills keep auto-triggering. Bounded by size + TTL.
		const SESSION_CACHE = new Map(); // key -> { skills, ts }
		const CACHE_MAX = 200;
		const CACHE_TTL_MS = 30 * 60 * 1000;

		function djb2(str) {
		let h = 5381;
		for (let i = 0; i < str.length; i++) h = ((h << 5) + h + str.charCodeAt(i)) \| 0;
		return (h >>> 0).toString(36);
		}

		// Earliest user prompt with catalog/xml reminder blocks stripped, so the key is stable
		// whether or not THIS turn still carries the catalog.
		function firstUserNormalized(messages) {
		if (!Array.isArray(messages)) return '';
		for (const m of messages) {
		if (!m \|\| m.role !== 'user') continue;
		let t = typeof m.content === 'string' ? m.content
		: Array.isArray(m.content) ? m.content.filter(b => b && b.type === 'text').map(b => b.text).join(' ') : '';
		if (!t) continue;
		t = t.replace(/<(?:system-reminder\|functions\|function)>[\s\S]*?<\/(?:system-reminder\|functions\|function)>/gi, '').trim();
		if (t) return t;
		}
		return '';
		}

		function sessionKey(messages, tools) {
		const first = firstUserNormalized(messages).slice(0, 2000);
		if (!first) return '';
		const toolNames = Array.isArray(tools) ? tools.map(t => t && t.name).filter(Boolean).sort().join(',') : '';
		return djb2(first + '\|' + toolNames);
		}

		function cacheSet(key, skills) {
		if (!key \|\| !skills \|\| !skills.length) return;
		SESSION_CACHE.set(key, { skills, ts: Date.now() });
		while (SESSION_CACHE.size > CACHE_MAX) {
		SESSION_CACHE.delete(SESSION_CACHE.keys().next().value); // FIFO evict oldest
		}
		}

		function cacheGet(key) {
		const e = key && SESSION_CACHE.get(key);
		if (!e) return null;
		if (Date.now() - e.ts > CACHE_TTL_MS) { SESSION_CACHE.delete(key); return null; }
		return e.skills;
		}

		/** Test hooks (not used by production paths). */
		export function _resetSkillCatalogCache() { SESSION_CACHE.clear(); }
		export function _skillCatalogCacheSize() { return SESSION_CACHE.size; }

		// Cheap presence check: does the request still carry the Claude Code skill catalog?
		export function hasSkillCatalog(messages) {
		return flattenText(messages).includes(CATALOG_HEADER);
		}

		// US-001 self-check: re-injection is OFF (LOCAL_FIDELITY=lean) yet the request carries a
		// Skill tool + catalog → the proxy is about to strip skills without restoring them (the
		// 1.14.1 trim-without-restore failure mode). Pure; the proxy adds the once-guard + log.
		export function shouldWarnTrimWithoutRestore({ fidelity, hasSkillTool, catalogPresent } = {}) {
		return fidelity === 'lean' && Boolean(hasSkillTool) && Boolean(catalogPresent);
		}

		function flattenText(messages) {
		@@ -156,9 +266,12 @@ if (!Array.isArray(messages)) return '';
		systemPct = 0.08,
		scope = null, // 'project' \| 'all'; null → derive (full→all, else→project) (0016)
		projectDir = null, // where to read .claude/skills for project scope
		alwaysInclude = null, // names always kept in project scope; null → WORKFLOW_CORE
		tools = null, // tool defs — part of the session-cache key (0013/US-001)
		} = {}) {
		if (fidelity === 'lean') return { addition: '', injected: 0, rawCount: 0 };
		const effScope = scope \|\| (fidelity === 'full' ? 'all' : 'project');
		const always = alwaysInclude \|\| WORKFLOW_CORE;

		const parts = [];
		const core = buildBehavioralCore(fidelity);
		if (core) parts.push(core);

		let block = '';
		let injected = 0;
		@@ -170,15 +283,43 @@ let rawCount = 0;
		rawCount = harvested.rawCount;
		if (harvested.skills.length) {

		// 0013/US-001: cache the RAW catalog on turn 1 (Claude Code sends it once); restore on
		// turn 2+ when this turn arrives without it, so skills survive the whole session. The
		// 0016 project-scoping filter below then applies identically to harvested or cached skills.
		const key = sessionKey(messages, tools);
		let skills = harvested.skills;
		if (skills.length) cacheSet(key, skills);
		else { const cached = cacheGet(key); if (cached) skills = cached; }

		const projectSkills = effScope === 'project' ? readProjectSkillNames(projectDir) : [];
		if (effScope === 'project') {
		// Restrict to project skills + workflow-core → small AND query-independent (cacheable),
		// instead of injecting the whole global catalog every turn.
		const allow = new Set([...projectSkills, ...always].map(s => s.toLowerCase()));
		skills = skills.filter(s => allow.has(s.name.toLowerCase()));
		}

		if (skills.length) {
		const ctxBudgetChars = Math.floor(numCtx * systemPct) * 4;
		const budgetChars = fidelity === 'full'
		? Math.min(Math.max(ctxBudgetChars, 4000), 16000)
		: Math.min(4000, Math.max(ctxBudgetChars, 2000));
		const { block, kept } = selectSkills(harvested.skills, {
		const budgetChars = effScope === 'project'
		? 1500 // tight, stable
		: (fidelity === 'full'
		? Math.min(Math.max(ctxBudgetChars, 4000), 16000)
		: Math.min(4000, Math.max(ctxBudgetChars, 2000)));
		const sel = selectSkills(skills, {
		budgetChars,
		query: latestUserText(messages),
		// project scope is query-INDEPENDENT so the block stays in the cacheable prefix.
		query: effScope === 'project' ? '' : latestUserText(messages),
		fidelity,
		projectSkills,
		});
		if (block) { parts.push(block); injected = kept; }
		if (sel.block) { block = sel.block; injected = sel.kept; }
		}
		}

		// Build the behavioral core AFTER we know whether a skill block exists, so US-001's
		// "available skills listed below" reference is not dangled when none is injected.
		const parts = [];
		const core = buildBehavioralCore(fidelity, { hasSkills: Boolean(block) });
		if (core) parts.push(core);
		if (block) parts.push(block);
		return { addition: parts.join('\n\n'), injected, rawCount };
		@@ -190,3 +331,3 @@ }
		*/
		export function buildBehavioralCore(fidelity = 'balanced') {
		export function buildBehavioralCore(fidelity = 'balanced', { hasSkills = true } = {}) {
		if (fidelity === 'lean') return '';
		@@ -197,4 +338,7 @@ const core = [
		'Plan before acting on multi-step work; satisfy dependencies before dependent steps; verify changes before claiming success.',
		'SKILLS: When a user request matches one of the available skills listed below, calling the Skill tool with that skill name is a BLOCKING REQUIREMENT — call Skill FIRST, before any other response or tool use. "simple", "quick", and "basic" are NOT opt-out phrases.',
		];
		// US-001: only reference the skill index when one will actually follow.
		if (hasSkills) {
		core.push('SKILLS: When a user request matches one of the available skills listed below, calling the Skill tool with that skill name is a BLOCKING REQUIREMENT — call Skill FIRST, before any other response or tool use. "simple", "quick", and "basic" are NOT opt-out phrases.');
		}
		if (fidelity === 'full') {
		@@ -201,0 +345,0 @@ core.push('Prefer reusing existing functions and patterns over writing new code. Match the surrounding code style. Never invent file paths or APIs — verify they exist before referencing them.');

cli.js

Sorry, the diff of this file is too big to display

proxy.mjs

Sorry, the diff of this file is too big to display

anymodel - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics