Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

anymodel

Package Overview
Dependencies
Maintainers
1
Versions
97
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

anymodel - npm Package Compare versions

Comparing version
1.15.0
to
1.16.0
+29
-8
LOCAL_SETUP.md

@@ -225,2 +225,5 @@ # Running Claude Code locally through AnyModel → LMStudio

| `LOCAL_SKILL_DESC_CHARS` | `140` | Max chars per skill description in the index |
| `LOCAL_PROJECT_DIR` | cwd | Where the proxy reads `.claude/skills/` for project scope |
| `LOCAL_SKILL_SCOPE` | derived | `project` \| `all` — override scope independent of tier |
| `LOCAL_SKILL_ALWAYS` | sw:* essentials | Comma list of skills always kept in project scope |

@@ -240,13 +243,31 @@ ## Skill auto-trigger on local models (`--local-fidelity`)

| Tier | What it re-injects | Cold turn-1 TTFT | When to use |
| Tier | Skill scope | Index size | When to use |
|---|---|---|---|
| `lean` | nothing (current pre-0010 behavior) | no change | latency purists; you don't use skills locally |
| `balanced` *(default)* | curated behavioral core (~700 tok) + clipped skill index (≤~1000 tok, `whenToUse` dropped) | +~0.7-1.3 s, then ~0 ms (KV reuse) | the daily driver — skills auto-trigger |
| `full` | richer index (keeps `whenToUse`, higher clamp) + fuller rules | +~1.7-3.3 s | 131 K ctx or the 80B model |
| `lean` | — (nothing) | 0 | latency purists; you don't use skills locally |
| `balanced` *(default)* | **project `.claude/skills` + sw:* workflow-core** | ~150-500 tok | the daily driver — relevant skills, small + cacheable |
| `full` | whole harvested catalog | ~1-3 K tok | when you want every global/plugin skill available |
Measured on M4 Max / qwen3-coder-30b MLX: `lean` triggers skills **0/12** of the time
(catalog stripped); `balanced` triggers **9/12 (75%)** with valid skill names. Run the
harness yourself: `node test/skill-trigger-eval.mjs` (needs a running proxy). MCP
suppression is unaffected by this flag — use `--full-mcp` for that.
**Scope matters more than budget (0016).** Measured on a realistic local request (100
skills, 90 tools, 6.7 KB system) on qwen3-coder-30b MLX, the prefix breaks down as
**tool schemas 7,757 tok (79%)**, system 917 tok (9%), skill index 1,147 tok (12%). So
the skill index is the *small* cost — and `full` injects ~30 mostly-irrelevant global
skills. `balanced` scopes the index to **your project's own skills + the SpecWeave
workflow** (read from `LOCAL_PROJECT_DIR`, default cwd), dropping the rest: in the bench,
**2,812 tok (full) → 147 tok (project scope)**. It stays *query-independent* on purpose,
so it lives in the cacheable prefix instead of busting the KV cache every turn.
To get your project's skills, start the proxy from the project dir or pass the dir:
```bash
LOCAL_PROJECT_DIR=~/Projects/wc26 anymodel proxy lmstudio # balanced = wc26 skills + sw:*
anymodel proxy lmstudio --local-fidelity full # or: every skill, bigger prefix
```
**The real lever is tools, not skills.** If local is still slow, the 90 tool schemas
(7.7 K tok even after 68 % compression) dominate — cut them with `LOCAL_MAX_TOOLS` or a
lower `LOCAL_TOOL_BUDGET_PCT` (default `0.30`). MCP suppression is separate — `--full-mcp`.
Skill triggering on qwen3-coder-30b is real but variable (~50-75 % per run at default
temperature). Run the harness yourself: `node test/skill-trigger-eval.mjs` (needs a
running proxy; `full` tier to exercise the whole catalog).
## The full three-command reference

@@ -253,0 +274,0 @@

{
"name": "anymodel",
"version": "1.15.0",
"version": "1.16.0",
"description": "Universal AI model proxy — route any coding tool through OpenRouter, Ollama, LMStudio, llama.cpp, or any LLM provider",

@@ -5,0 +5,0 @@ "type": "module",

@@ -14,3 +14,3 @@ // Factory for "local OpenAI-compatible" providers (LMStudio, llama-server, any

import https from 'https';
import { translateRequest, translateResponse, createStreamTranslator } from './openai.mjs';
import { translateRequest, translateResponse, createStreamTranslator, isVisionModel } from './openai.mjs';

@@ -53,3 +53,6 @@ export function makeOpenAILocalProvider({

transformRequest: translateRequest,
// US-003: gate image forwarding on the model's vision capability so a screenshot
// sent to a non-vision local coding model degrades to a descriptive marker instead
// of an image_url the model silently ignores (LOCAL_VISION overrides the heuristic).
transformRequest: (body) => translateRequest(body, { visionCapable: isVisionModel(body?.model) }),
// P0.2: mark responses as coming from a LOCAL provider so text-channel

@@ -56,0 +59,0 @@ // tool-call recovery engages under ANYMODEL_PARSE_TEXT_TOOLCALLS=auto.

@@ -21,2 +21,32 @@ // OpenAI provider for anymodel

// US-003: a descriptive placeholder for an image we are NOT forwarding (non-vision
// backend or unresolvable source). Includes the decoded byte size + mime when known
// so a blind model still knows a screenshot was produced, instead of a bare marker
// or a silent ''.
export function imageMarker(b) {
const src = b?.source;
const mime = (src && typeof src === 'object') ? src.media_type : undefined;
let bytes;
if (src && src.type === 'base64' && typeof src.data === 'string') {
bytes = Math.floor((src.data.length * 3) / 4); // base64 → approx decoded bytes
} else if (src && src.type === 'url' && typeof src.url === 'string') {
return mime ? `[image omitted: ${mime}, ${src.url}]` : `[image omitted: ${src.url}]`;
}
if (bytes != null && mime) return `[image omitted: ${bytes} bytes, ${mime}]`;
if (mime) return `[image omitted: ${mime}]`;
return '[image omitted]';
}
// US-003: decide whether the target model can consume image parts. `LOCAL_VISION`
// (on|off|auto, default auto) overrides; auto matches known multimodal model-name
// fragments. Coding models (qwen3-coder, deepseek-coder, …) are non-vision → images
// become a descriptive marker rather than an image_url the model silently ignores.
export function isVisionModel(model) {
const mode = (process.env.LOCAL_VISION || 'auto').toLowerCase();
if (mode === 'on') return true;
if (mode === 'off') return false;
const m = String(model || '').toLowerCase();
return /(?:^|[-_/])(?:vl|vision|llava|pixtral|moondream|internvl)|minicpm-?v|gemma-?[34]|llama-?3\.2-vision|qwen[\w.-]*-vl/.test(m);
}
// P1.2: translate an Anthropic content-block array into OpenAI message content.

@@ -28,3 +58,3 @@ // Returns a plain STRING when every block is text (keeps text-only turns

// drop.
export function blocksToOpenAIContent(blocks) {
export function blocksToOpenAIContent(blocks, { visionCapable = true } = {}) {
const parts = [];

@@ -37,5 +67,5 @@ let hasImage = false;

if (b.type === 'image') {
const url = imageBlockToUrl(b);
const url = visionCapable ? imageBlockToUrl(b) : null;
if (url) { parts.push({ type: 'image_url', image_url: { url } }); hasImage = true; }
else parts.push({ type: 'text', text: '[image omitted]' });
else parts.push({ type: 'text', text: imageMarker(b) });
continue;

@@ -55,3 +85,3 @@ }

// the model (OpenAI's tool role has no structured error field).
export function extractToolResultParts(block) {
export function extractToolResultParts(block, { visionCapable = true } = {}) {
const imageUrls = [];

@@ -66,5 +96,5 @@ let text;

else if (b?.type === 'image') {
const url = imageBlockToUrl(b);
const url = visionCapable ? imageBlockToUrl(b) : null;
if (url) imageUrls.push(url);
else pieces.push('[image omitted]');
else pieces.push(imageMarker(b));
} else if (b?.type === 'document') pieces.push('[document omitted]');

@@ -81,3 +111,3 @@ else if (typeof b?.text === 'string') pieces.push(b.text);

export function translateRequest(anthropicBody) {
export function translateRequest(anthropicBody, { visionCapable = true } = {}) {
const openaiBody = {

@@ -123,3 +153,3 @@ model: anthropicBody.model,

// P1.3: preserve is_error marker; P1.2: hoist images (tool role is text-only)
const { text, imageUrls } = extractToolResultParts(block);
const { text, imageUrls } = extractToolResultParts(block, { visionCapable });
openaiBody.messages.push({

@@ -140,6 +170,6 @@ role: 'tool',

// P1.2: a bare image alongside tool_result blocks — emit as its own user turn
const url = imageBlockToUrl(block);
const url = visionCapable ? imageBlockToUrl(block) : null;
openaiBody.messages.push({
role: 'user',
content: url ? [{ type: 'image_url', image_url: { url } }] : '[image omitted]',
content: url ? [{ type: 'image_url', image_url: { url } }] : imageMarker(block),
});

@@ -150,3 +180,3 @@ }

// Regular user message with content blocks (P1.2: images → vision parts)
openaiBody.messages.push({ role: 'user', content: blocksToOpenAIContent(msg.content) });
openaiBody.messages.push({ role: 'user', content: blocksToOpenAIContent(msg.content, { visionCapable }) });
}

@@ -153,0 +183,0 @@ } else {

@@ -10,3 +10,10 @@ // providers/skill-catalog.mjs — increment 0010 (local skill-fidelity).

// byte-stable for prefix-cache (KV) reuse.
//
// Increment 0016 adds project-SCOPING: on local providers the default index is
// restricted to the project's own .claude/skills + a workflow-core, keeping it small
// and query-independent (cacheable) instead of injecting ~30 irrelevant global skills.
import { existsSync, readdirSync, statSync } from 'node:fs';
import { join } from 'node:path';
const CATALOG_HEADER = 'The following skills are available for use with the Skill tool:';

@@ -21,2 +28,105 @@

// Curated SpecWeave workflow essentials always kept in project scope (small + stable).
// Overridable via the caller's alwaysInclude (LOCAL_SKILL_ALWAYS). Keeps the local index
// relevant + cacheable instead of injecting ~30 irrelevant global skills every turn (0016).
export const WORKFLOW_CORE = [
'sw:increment', 'sw:do', 'sw:done', 'sw:pm', 'sw:architect',
'sw:grill', 'sw:validate', 'sw:progress', 'sw:brainstorm', 'sw:code-reviewer',
];
const _projectSkillMemo = new Map(); // dir -> string[]
/**
* Names of project-local skills under `<dir>/.claude/skills/<name>/SKILL.md`. Memoized
* per dir (one fs read per project dir, not per request). Best-effort: returns [] on a
* missing/unreadable dir, never throws. (0016)
*/
export function readProjectSkillNames(dir) {
if (!dir) return [];
if (_projectSkillMemo.has(dir)) return _projectSkillMemo.get(dir);
let names = [];
try {
const skillsDir = join(dir, '.claude', 'skills');
if (existsSync(skillsDir)) {
names = readdirSync(skillsDir).filter(n => {
try { return statSync(join(skillsDir, n)).isDirectory() && existsSync(join(skillsDir, n, 'SKILL.md')); }
catch { return false; }
});
}
} catch { names = []; }
_projectSkillMemo.set(dir, names);
return names;
}
/** Test hook: clear the project-skill memo. */
export function _resetProjectSkillMemo() { _projectSkillMemo.clear(); }
// ── Per-session skill-catalog cache (0013 / US-001) ──
// Claude Code injects the catalog <system-reminder> on the FIRST turn; later turns can
// arrive without it. We cache the harvested catalog keyed by a stable session signature
// (opening user prompt with reminder-tags stripped + the tool-name set) and re-inject on
// turn 2+ so skills keep auto-triggering. Bounded by size + TTL.
const SESSION_CACHE = new Map(); // key -> { skills, ts }
const CACHE_MAX = 200;
const CACHE_TTL_MS = 30 * 60 * 1000;
function djb2(str) {
let h = 5381;
for (let i = 0; i < str.length; i++) h = ((h << 5) + h + str.charCodeAt(i)) | 0;
return (h >>> 0).toString(36);
}
// Earliest user prompt with catalog/xml reminder blocks stripped, so the key is stable
// whether or not THIS turn still carries the catalog.
function firstUserNormalized(messages) {
if (!Array.isArray(messages)) return '';
for (const m of messages) {
if (!m || m.role !== 'user') continue;
let t = typeof m.content === 'string' ? m.content
: Array.isArray(m.content) ? m.content.filter(b => b && b.type === 'text').map(b => b.text).join(' ') : '';
if (!t) continue;
t = t.replace(/<(?:system-reminder|functions|function)>[\s\S]*?<\/(?:system-reminder|functions|function)>/gi, '').trim();
if (t) return t;
}
return '';
}
function sessionKey(messages, tools) {
const first = firstUserNormalized(messages).slice(0, 2000);
if (!first) return '';
const toolNames = Array.isArray(tools) ? tools.map(t => t && t.name).filter(Boolean).sort().join(',') : '';
return djb2(first + '|' + toolNames);
}
function cacheSet(key, skills) {
if (!key || !skills || !skills.length) return;
SESSION_CACHE.set(key, { skills, ts: Date.now() });
while (SESSION_CACHE.size > CACHE_MAX) {
SESSION_CACHE.delete(SESSION_CACHE.keys().next().value); // FIFO evict oldest
}
}
function cacheGet(key) {
const e = key && SESSION_CACHE.get(key);
if (!e) return null;
if (Date.now() - e.ts > CACHE_TTL_MS) { SESSION_CACHE.delete(key); return null; }
return e.skills;
}
/** Test hooks (not used by production paths). */
export function _resetSkillCatalogCache() { SESSION_CACHE.clear(); }
export function _skillCatalogCacheSize() { return SESSION_CACHE.size; }
// Cheap presence check: does the request still carry the Claude Code skill catalog?
export function hasSkillCatalog(messages) {
return flattenText(messages).includes(CATALOG_HEADER);
}
// US-001 self-check: re-injection is OFF (LOCAL_FIDELITY=lean) yet the request carries a
// Skill tool + catalog → the proxy is about to strip skills without restoring them (the
// 1.14.1 trim-without-restore failure mode). Pure; the proxy adds the once-guard + log.
export function shouldWarnTrimWithoutRestore({ fidelity, hasSkillTool, catalogPresent } = {}) {
return fidelity === 'lean' && Boolean(hasSkillTool) && Boolean(catalogPresent);
}
function flattenText(messages) {

@@ -156,9 +266,12 @@ if (!Array.isArray(messages)) return '';

systemPct = 0.08,
scope = null, // 'project' | 'all'; null → derive (full→all, else→project) (0016)
projectDir = null, // where to read .claude/skills for project scope
alwaysInclude = null, // names always kept in project scope; null → WORKFLOW_CORE
tools = null, // tool defs — part of the session-cache key (0013/US-001)
} = {}) {
if (fidelity === 'lean') return { addition: '', injected: 0, rawCount: 0 };
const effScope = scope || (fidelity === 'full' ? 'all' : 'project');
const always = alwaysInclude || WORKFLOW_CORE;
const parts = [];
const core = buildBehavioralCore(fidelity);
if (core) parts.push(core);
let block = '';
let injected = 0;

@@ -170,15 +283,43 @@ let rawCount = 0;

rawCount = harvested.rawCount;
if (harvested.skills.length) {
// 0013/US-001: cache the RAW catalog on turn 1 (Claude Code sends it once); restore on
// turn 2+ when this turn arrives without it, so skills survive the whole session. The
// 0016 project-scoping filter below then applies identically to harvested or cached skills.
const key = sessionKey(messages, tools);
let skills = harvested.skills;
if (skills.length) cacheSet(key, skills);
else { const cached = cacheGet(key); if (cached) skills = cached; }
const projectSkills = effScope === 'project' ? readProjectSkillNames(projectDir) : [];
if (effScope === 'project') {
// Restrict to project skills + workflow-core → small AND query-independent (cacheable),
// instead of injecting the whole global catalog every turn.
const allow = new Set([...projectSkills, ...always].map(s => s.toLowerCase()));
skills = skills.filter(s => allow.has(s.name.toLowerCase()));
}
if (skills.length) {
const ctxBudgetChars = Math.floor(numCtx * systemPct) * 4;
const budgetChars = fidelity === 'full'
? Math.min(Math.max(ctxBudgetChars, 4000), 16000)
: Math.min(4000, Math.max(ctxBudgetChars, 2000));
const { block, kept } = selectSkills(harvested.skills, {
const budgetChars = effScope === 'project'
? 1500 // tight, stable
: (fidelity === 'full'
? Math.min(Math.max(ctxBudgetChars, 4000), 16000)
: Math.min(4000, Math.max(ctxBudgetChars, 2000)));
const sel = selectSkills(skills, {
budgetChars,
query: latestUserText(messages),
// project scope is query-INDEPENDENT so the block stays in the cacheable prefix.
query: effScope === 'project' ? '' : latestUserText(messages),
fidelity,
projectSkills,
});
if (block) { parts.push(block); injected = kept; }
if (sel.block) { block = sel.block; injected = sel.kept; }
}
}
// Build the behavioral core AFTER we know whether a skill block exists, so US-001's
// "available skills listed below" reference is not dangled when none is injected.
const parts = [];
const core = buildBehavioralCore(fidelity, { hasSkills: Boolean(block) });
if (core) parts.push(core);
if (block) parts.push(block);
return { addition: parts.join('\n\n'), injected, rawCount };

@@ -190,3 +331,3 @@ }

*/
export function buildBehavioralCore(fidelity = 'balanced') {
export function buildBehavioralCore(fidelity = 'balanced', { hasSkills = true } = {}) {
if (fidelity === 'lean') return '';

@@ -197,4 +338,7 @@ const core = [

'Plan before acting on multi-step work; satisfy dependencies before dependent steps; verify changes before claiming success.',
'SKILLS: When a user request matches one of the available skills listed below, calling the Skill tool with that skill name is a BLOCKING REQUIREMENT — call Skill FIRST, before any other response or tool use. "simple", "quick", and "basic" are NOT opt-out phrases.',
];
// US-001: only reference the skill index when one will actually follow.
if (hasSkills) {
core.push('SKILLS: When a user request matches one of the available skills listed below, calling the Skill tool with that skill name is a BLOCKING REQUIREMENT — call Skill FIRST, before any other response or tool use. "simple", "quick", and "basic" are NOT opt-out phrases.');
}
if (fidelity === 'full') {

@@ -201,0 +345,0 @@ core.push('Prefer reusing existing functions and patterns over writing new code. Match the surrounding code style. Never invent file paths or APIs — verify they exist before referencing them.');

Sorry, the diff of this file is too big to display

Sorry, the diff of this file is too big to display