Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

apple-local-llm

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

apple-local-llm

Call Apple's on-device Foundation Models — no servers, no setup.

latest

Source

npm

Version: 1.0.0

Version published: 5 months ago

Maintainers: 1

Created: 5 months ago

Source

apple-local-llm

Call Apple's on-device Foundation Models from JavaScript — no servers, no setup.

Works with Node.js, Electron, and VS Code extensions.

Requirements

macOS 26+ (Tahoe)
Apple Silicon (M Series)
Apple Intelligence enabled in System Settings

Installation

npm install apple-local-llm

Quick Start

Simple API

import { createClient } from "apple-local-llm";

const client = createClient();

// Check compatibility first
const compat = await client.compatibility.check();
if (!compat.compatible) {
  console.log("Not available:", compat.reasonCode);
  // Handle fallback to cloud API
}

// Generate a response
const result = await client.responses.create({
  input: "What is the capital of France?",
});

if (result.ok) {
  console.log(result.text); // "The capital of France is Paris."
}

Streaming

for await (const chunk of client.stream({ input: "Count from 1 to 5." })) {
  if ("delta" in chunk) {
    process.stdout.write(chunk.delta);
  }
}

API Reference

`createClient(options?)`

Creates a new client instance.

const client = createClient({
  model: "default",               // Optional: model identifier (currently only "default")
  onLog: (msg) => console.log(msg), // Optional: debug logging
  idleTimeoutMs: 5 * 60 * 1000,     // Optional: helper idle timeout (default: 5 min)
});

Defaults:

Helper auto-shuts down after 5 minutes of inactivity
Helper auto-restarts up to 3 times on crash (with exponential backoff)
Request timeout: 60 seconds (configurable via timeoutMs)

You can also import and instantiate the class directly:

import { AppleLocalLLMClient } from "apple-local-llm";
const client = new AppleLocalLLMClient(options);

`client.compatibility.check()`

Check if the local model is available. Always call this before making requests.

const result = await client.compatibility.check();
// { compatible: true }
// or { compatible: false, reasonCode: "AI_DISABLED" }

Reason codes:

Code	Description
`NOT_DARWIN`	Not running on macOS
`UNSUPPORTED_HARDWARE`	Not Apple Silicon
`AI_DISABLED`	Apple Intelligence not enabled
`MODEL_NOT_READY`	Model still downloading
`SPAWN_FAILED`	Helper binary failed to start
`HELPER_NOT_FOUND`	Helper binary not found
`HELPER_UNHEALTHY`	Helper process not responding correctly
`PROTOCOL_MISMATCH`	Helper version incompatible with client

`client.capabilities.get()`

Get detailed model capabilities (calls the helper).

const caps = await client.capabilities.get();
// { available: true, model: "apple-on-device" }
// or { available: false, reasonCode: "AI_DISABLED" }

`client.responses.create(params)`

Generate a response.

const result = await client.responses.create({
  input: "Your prompt here",
  model: "default",         // Optional: model identifier
  max_output_tokens: 500,   // Optional: limit response tokens
  stream: false,            // Optional
  signal: abortController.signal, // Optional: AbortSignal
  timeoutMs: 60000,         // Optional: request timeout (ms)
  response_format: {        // Optional: structured JSON output
    type: "json_schema",
    json_schema: {
      name: "Result",
      schema: { type: "object", properties: { ... } }
    }
  }
});

Structured Output Example:

const result = await client.responses.create({
  input: "List 3 colors",
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "Colors",
      schema: {
        type: "object",
        properties: {
          colors: { type: "array", items: { type: "string" } }
        }
      }
    }
  }
});
const data = JSON.parse(result.text); // { colors: ["red", "blue", "green"] }

response_format is not supported with streaming.

Returns ResponseResult on success, or an error object:

// Success:
{ ok: true, text: "...", request_id: "..." }
// Error:
{ ok: false, error: { code: "...", detail: "..." } }

Note: The return type is a discriminated union, not the exported ResponseResult interface.

Error codes:

Code	Description
`UNAVAILABLE`	Model not available (see reason codes above)
`TIMEOUT`	Request timed out (default: 60s)
`CANCELLED`	Request was cancelled via AbortSignal
`RATE_LIMITED`	System rate limit exceeded
`GUARDRAIL`	Content violated Apple's safety guidelines
`INTERNAL`	Unexpected error

`client.stream(params)`

Async generator for streaming responses.

for await (const chunk of client.stream({ input: "..." })) {
  if ("delta" in chunk) {
    // Partial content
    console.log(chunk.delta);
  } else if ("done" in chunk) {
    // Final complete text
    console.log(chunk.text);
  }
}

`client.responses.cancel(requestId)`

Cancel an in-progress request.

const result = await client.responses.cancel("req_123");
// { ok: true } or { ok: false, error: { code: "NOT_RUNNING", detail: "..." } }

`client.shutdown()`

Gracefully shut down the helper process.

await client.shutdown();

TypeScript Types

All types are exported:

import type {
  ClientOptions,
  ReasonCode,
  CompatibilityResult,
  CapabilitiesResult,
  ResponsesCreateParams,
  ResponseResult,
  JSONSchema,
  ResponseFormat,
} from "apple-local-llm";

CLI Usage

The fm-proxy binary can also be used directly from the command line:

# Simple prompt
fm-proxy "What is the capital of France?"

# Streaming output
fm-proxy --stream "Tell me a story"
fm-proxy -s "Tell me a story"

# Limit output tokens
fm-proxy --max-tokens=50 "Count to 100"

# Start HTTP server
fm-proxy --serve
fm-proxy --serve --port=3000

# Other options
fm-proxy --help      # Show usage (or -h)
fm-proxy --version   # Show version (or -v)
fm-proxy --stdio     # stdio mode (used internally by npm package)

HTTP Server Mode

Run fm-proxy --serve to start a local HTTP server:

fm-proxy --serve --port=8080

Endpoints:

Endpoint	Method	Description
`/health`	GET	Health check and availability status
`/generate`	POST	Text generation (supports streaming)

Options:

Option	Description
`--port=<PORT>`	Set server port (default: 8080)
`--auth-token=<TOKEN>`	Require Bearer token for `/generate`

You can also set AUTH_TOKEN environment variable instead of --auth-token.

CORS: All endpoints support CORS with Access-Control-Allow-Origin: *.

Examples:

# Health check
curl http://127.0.0.1:8080/health
# Response: {"status":"ok","model":"apple-on-device","available":true}

# Simple generation
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "What is 2+2?"}'
# Response: {"text":"2+2 equals 4."}

# With max_output_tokens
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Count to 100", "max_output_tokens": 50}'

# With structured output (response_format)
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "List 3 colors", "response_format": {"type": "json_schema", "json_schema": {"name": "Colors", "schema": {"type": "object", "properties": {"colors": {"type": "array", "items": {"type": "string"}}}}}}}'

# With authentication
curl -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{"input": "Hello"}'

Streaming (SSE)

Add "stream": true to get Server-Sent Events with OpenAI-compatible chunks:

curl -N -X POST http://127.0.0.1:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"input": "Write a haiku", "stream": true}'

Response:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"..."}}]}
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

How It Works

This package bundles a small native helper (fm-proxy) that communicates with Apple's Foundation Models framework over stdio. The helper is spawned on first request and stays alive to keep the model warm.

No localhost server — npm package uses stdio, not HTTP
No user setup — just npm install
Fails gracefully — check compatibility.check() and fall back to cloud

Runtime Support

JS API (createClient()):

Environment	Supported
Node.js	✅
Electron (main process)	✅
VS Code extensions	✅
Electron (renderer)	❌ No `child_process`
Browser	❌

HTTP Server (fm-proxy --serve):

Environment	Supported
Any HTTP client	✅
Browser (fetch)	✅
Electron (renderer)	✅

License

MIT

Keywords

FAQs

What is apple-local-llm?

Is apple-local-llm well maintained?

Package last updated on 04 Jan 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

apple-local-llm

apple-local-llm

Requirements

Installation

Quick Start

Simple API

Streaming

API Reference

createClient(options?)

client.compatibility.check()

client.capabilities.get()

client.responses.create(params)

client.stream(params)

client.responses.cancel(requestId)

client.shutdown()

TypeScript Types

CLI Usage

HTTP Server Mode

Streaming (SSE)

How It Works

Runtime Support

License

Keywords

Related posts

Mini Shai-Hulud, Miasma, and Hades Worms Target Bioinformatics and MCP Developers via Malicious PyPI Wheels

Shai-Hulud Descends to Hades: Miasma Worm Campaign Spreads with New PyPI Wave

`createClient(options?)`

`client.compatibility.check()`

`client.capabilities.get()`

`client.responses.create(params)`

`client.stream(params)`

`client.responses.cancel(requestId)`

`client.shutdown()`