Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

anymodel

Package Overview
Dependencies
Maintainers
1
Versions
91
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

anymodel

Universal AI model proxy — route any coding tool through OpenRouter, Ollama, LMStudio, llama.cpp, or any LLM provider

latest
Source
npmnpm
Version
1.12.0
Version published
Weekly downloads
265
-20.18%
Maintainers
1
Weekly downloads
 
Created
Source

AnyModel

Universal AI coding tool — use GPT-5.4, Gemini 3.1, DeepSeek R1, Codex, Llama, and 300+ models through one interface.

npm version license node

AnyModel is an AI coding assistant that works with any model. It includes a proxy that routes requests to OpenRouter (300+ cloud models), local backends (Ollama, LMStudio, llama.cpp), or any OpenAI-compatible API — with smart retries, format translation, and zero dependencies.

anymodel.dev — full docs, presets, and FAQ.

Watch the Demo

Watch the demo

Quick Start

# Terminal 1 — start AnyModel proxy with a model:
OPENROUTER_API_KEY=sk-or-v1-your-key npx anymodel proxy deepseek

# Terminal 2 — launch AnyModel:
npx anymodel

The model is set on the proxy via preset or --model. Connecting is always just npx anymodel.

Get your free OpenRouter key at openrouter.ai/keys — no credit card for free models.

Presets

# Paid models:
npx anymodel proxy gpt        # → openai/gpt-5.4                       (paid)
npx anymodel proxy codex      # → openai/gpt-5.3-codex                 (paid, coding)
npx anymodel proxy gemini     # → google/gemini-3.1-flash-lite-preview  (paid)
npx anymodel proxy deepseek   # → deepseek/deepseek-r1-0528            (paid)
npx anymodel proxy mistral    # → mistralai/devstral-2512               (paid, coding)
npx anymodel proxy gemma      # → google/gemma-4-31b-it                (paid, coding)

# Free models:
npx anymodel proxy qwen       # → qwen/qwen3-coder:free                (free)
npx anymodel proxy nemotron   # → nvidia/nemotron-3-super-120b-a12b:free (free)
npx anymodel proxy llama      # → meta-llama/llama-3.3-70b-instruct:free (free)

Or any of 300+ models: npx anymodel proxy --model mistralai/codestral-2508

How It Works

AnyModel client → anymodel proxy (:9090) → OpenRouter / Ollama / LMStudio / llama.cpp

The proxy intercepts requests, strips provider-specific fields, handles retries with exponential backoff, and streams responses back.

Multiple Models at Once

Run separate instances on different ports:

npx anymodel proxy --port 9090 --model openai/gpt-5.4
npx anymodel proxy --port 9091 --model deepseek/deepseek-r1-0528
npx anymodel proxy --port 9092 --model google/gemini-3.1-flash-lite-preview

Local Backends

No internet, no API key — run everything on your machine. AnyModel treats Ollama, LMStudio, and llama.cpp as first-class backends, each with its own preset:

npx anymodel proxy ollama --model gemma3n            # Ollama    (:11434)
npx anymodel proxy lmstudio --model qwen3-coder      # LMStudio  (:1234/v1)
npx anymodel proxy llamacpp --model my-model         # llama.cpp (:8080/v1)
BackendPortAPIBest for
Ollama11434Native (think:false suppresses reasoning-token waste on qwen3/deepseek)One-line model pulls, managed model library
LMStudio1234/v1OpenAI-compatibleGUI model browser, easy swapping between loaded models
llama.cpp8080/v1OpenAI-compatibleRawest/smallest footprint, max control (context, GPU layers, batch, quantization)

GGUF portability: The same GGUF model file runs across all three — only the wrapper UX differs. Download once, use anywhere. llama.cpp is the inference engine under Ollama and LMStudio.

Override endpoints via env:

LMSTUDIO_BASE_URL=http://192.168.1.50:1234/v1 npx anymodel proxy lmstudio
LLAMACPP_BASE_URL=http://localhost:9000/v1    npx anymodel proxy llamacpp

Auto-detection priority when no preset is given: OpenRouter key → OpenAI key → Ollama → LMStudio → llama.cpp.

Local-provider smart defaults (1.11.0+)

When you connect to a local provider, AnyModel automatically suppresses your globally-configured MCP servers — which are usually the single biggest cause of slow first-response times (50–60 K tokens of tool schemas that local models can't handle).

  • npx anymodel on a local provider → loads project ./.claude/.mcp.json if present, else no MCP
  • Keeps project skills, agents, CLAUDE.md
  • Remote providers (openrouter, openai) unchanged
  • Opt out: --full-mcp flag or ANYMODEL_FULL_MCP=1

See LOCAL_SETUP.md for the full guide, including 32 K context setup and full isolation.

OpenAI-Compatible APIs

Works with OpenAI, Azure, Together, Groq, vLLM, and any OpenAI-compatible endpoint:

OPENAI_API_KEY=sk-your-key npx anymodel proxy openai --model gpt-4o

# Terminal 2:
npx anymodel

Bidirectional translation: Anthropic Messages API ↔ OpenAI Chat Completions.

CLI Reference

anymodel                              # launch AnyModel (connect to proxy)
anymodel proxy <preset>               # start proxy with preset
anymodel proxy --model <id>           # start proxy with any model
anymodel proxy ollama --model <name>  # proxy with local Ollama    (:11434)
anymodel proxy lmstudio --model <id>  # proxy with LMStudio        (:1234/v1)
anymodel proxy llamacpp --model <id>  # proxy with llama.cpp       (:8080/v1)
anymodel claude                       # run with native Claude (no proxy)

Options:
  --model, -m     Model ID
  --port, -p      Port (default: 9090)
  --free-only     Block paid models
  --token, -t     Require auth token for requests
  --rpm           Rate limit requests/min (default: 60)
  --help, -h      Help

Ollama Performance Optimizations

When proxying to Ollama, AnyModel automatically applies several optimizations to make local models work well with coding tools:

  • System prompt condensing — AI tool prompts are 50-100KB; AnyModel condenses them to fit Ollama's context window (OLLAMA_MAX_SYSTEM_CHARS)
  • Tool description trimming — truncates verbose tool descriptions to save context (OLLAMA_MAX_TOOL_DESC, default 100 chars)
  • Tool count limiting — limits tools sent to the model, always keeping core tools (Bash/Read/Write/Edit/Grep/Glob) (OLLAMA_MAX_TOOLS)
  • Prefix-aware caching — stabilizes system prompt + tool ordering for Ollama KV cache reuse across requests, with date normalization and description-independent hashing
  • HTTP keep-alive — reuses TCP connections to Ollama
  • count_tokens mock — responds to /v1/messages/count_tokens locally, preventing cascading 500 errors

Environment Variables

VariableDefaultDescription
OPENROUTER_API_KEYYour OpenRouter key (get one free)
OPENROUTER_MODELDefault model override
OPENAI_API_KEYKey for OpenAI-compatible APIs
OPENAI_BASE_URLhttps://api.openai.com/v1Custom endpoint for the openai provider
LMSTUDIO_BASE_URLhttp://localhost:1234/v1LMStudio endpoint override
LLAMACPP_BASE_URLhttp://localhost:8080/v1llama.cpp (llama-server) endpoint override
PROXY_PORT9090Proxy port
ANYMODEL_CLIENTPath to custom client cli.js
ANYMODEL_TOKENAuth token for remote mode
OLLAMA_NUM_CTX8192Ollama context window size
OLLAMA_KEEP_ALIVE30mHow long Ollama keeps model in GPU memory
OLLAMA_MAX_SYSTEM_CHARS4000System prompt condensing threshold
OLLAMA_MAX_MSG_CHARSmax(4000, num_ctx*3)Message history threshold
OLLAMA_TOOLSautoTool capability: auto/on/off
OLLAMA_MAX_TOOLS0 (unlimited)Max tools to send (core tools always kept)
OLLAMA_MAX_TOOL_DESC100Max tool description length in chars

OPENROUTER_API_KEY is only needed when starting the proxy. OLLAMA_* variables only apply to the Ollama provider.

License

MIT — Anton Abyzov

Keywords

openrouter

FAQs

Package last updated on 21 Apr 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts