🚨 Latest Research:Tanstack npm Packages Compromised in Ongoing Mini Shai-Hulud Supply-Chain Attack.Learn More β†’
Socket
Book a DemoSign in
Socket

use-local-llm

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

use-local-llm

React hooks for streaming responses from local LLMs β€” Ollama, LM Studio, llama.cpp. Zero server required.

latest
Source
npmnpm
Version
0.1.0
Version published
Maintainers
1
Created
Source

use-local-llm

React hooks for streaming responses from local LLMs β€” Ollama, LM Studio, llama.cpp, and any OpenAI-compatible endpoint. Zero server required. Browser β†’ localhost, directly.

npm version npm downloads bundle size TypeScript license

Table of Contents

Why use-local-llm?

The problem: Vercel AI SDK is the standard for AI in React β€” but it requires server routes. Its React hooks (useChat, useCompletion) POST to your API routes, which then call the LLM. This architecture makes it impossible to call http://localhost:11434 directly from the browser.

If you're prototyping with Ollama, LM Studio, or llama.cpp, you don't need a server in between. You need one hook that talks directly to your local model.

use-local-llm gives you:

  • Direct browser β†’ localhost streaming β€” no server, no API routes
  • Multi-backend support β€” Ollama, LM Studio, llama.cpp, any OpenAI-compatible endpoint
  • Full chat state management β€” message history, abort, clear, error handling
  • Token-by-token streaming β€” real-time text rendering with onToken callbacks
  • Zero runtime dependencies β€” only a peer dependency on React
  • 2.8 KB gzipped β€” smaller than most icons

Quick Start

import { useOllama } from "use-local-llm";

function Chat() {
  const { messages, send, isStreaming } = useOllama("gemma3:1b");

  return (
    <div>
      {messages.map((m, i) => (
        <p key={i}>
          <strong>{m.role}:</strong> {m.content}
        </p>
      ))}
      <button
        onClick={() => send("Explain React hooks in one sentence")}
        disabled={isStreaming}
      >
        {isStreaming ? "Generating..." : "Ask"}
      </button>
    </div>
  );
}

That's it. Streaming, message history, abort β€” all handled in one hook call.

Installation

npm install use-local-llm
yarn add use-local-llm
pnpm add use-local-llm

Requirements:

  • React >= 17.0.0 (peer dependency)
  • A local LLM runtime running (Ollama, LM Studio, or llama.cpp)

Supported Backends

BackendDefault PortAuto-detectedChat APICompletion APIModel List
Ollama11434βœ…/api/chat/api/generate/api/tags
LM Studio1234βœ…/v1/chat/completions/v1/completions/v1/models
llama.cpp8080βœ…/v1/chat/completions/v1/completions/v1/models
Any OpenAI-compatiblecustomvia backend prop/v1/chat/completions/v1/completions/v1/models

The backend is auto-detected from the port number. You can also set it explicitly with the backend option.

API Reference

useOllama(model, options?)

Zero-config chat hook for Ollama. The simplest way to start.

const result = useOllama("gemma3:1b");
const result = useOllama("llama3.1:8b", { system: "Be concise.", temperature: 0.7 });

Parameters:

ParameterTypeRequiredDescription
modelstringβœ…Ollama model name (e.g. "gemma3:1b", "llama3.1:8b", "qwen2.5:latest")
optionsOllamaOptionsβ€”Configuration options (see below)

OllamaOptions:

OptionTypeDefaultDescription
systemstringβ€”System prompt to set model behavior
temperaturenumbermodel defaultSampling temperature (0 = deterministic, 1 = creative)
endpointstring"http://localhost:11434"Custom Ollama endpoint URL
onToken(token: string) => voidβ€”Callback fired on each streamed token
onResponse(msg: ChatMessage) => voidβ€”Callback fired when a complete response is received
onError(err: Error) => voidβ€”Callback fired on error

Returns: LocalLLMResult

useLocalLLM(options)

Full-featured chat hook supporting any local backend.

const result = useLocalLLM({
  endpoint: "http://localhost:1234",
  model: "mistral-7b",
  system: "Answer concisely.",
});

LocalLLMOptions:

OptionTypeRequiredDefaultDescription
endpointstringβœ…β€”Base URL of the LLM server
modelstringβœ…β€”Model name
backendBackendβ€”auto-detected"ollama" | "lmstudio" | "llamacpp" | "openai-compatible"
systemstringβ€”β€”System prompt
temperaturenumberβ€”model defaultSampling temperature
onToken(token: string) => voidβ€”β€”Called on each streamed token
onResponse(msg: ChatMessage) => voidβ€”β€”Called on complete response
onError(err: Error) => voidβ€”β€”Called on error

Returns: LocalLLMResult

PropertyTypeDescription
messagesChatMessage[]Full conversation history (user + assistant messages)
send(content: string) => voidSend a user message and trigger streaming response
isStreamingbooleantrue while tokens are being generated
isLoadingbooleantrue while the request is in-flight (before first token)
abort() => voidCancel the current generation immediately
errorError | nullThe last error that occurred, or null
clear() => voidReset the entire conversation history

useStreamCompletion(options)

Low-level hook for text completions (non-chat) with manual start/stop control.

const result = useStreamCompletion({
  endpoint: "http://localhost:11434",
  model: "gemma3:1b",
  prompt: "Write a haiku about TypeScript",
});

StreamCompletionOptions:

OptionTypeRequiredDefaultDescription
endpointstringβœ…β€”Base URL of the LLM server
modelstringβœ…β€”Model name
promptstringβœ…β€”The text prompt to send
backendBackendβ€”auto-detectedBackend type
autoFetchbooleanβ€”falseAuto-start streaming when prompt changes
temperaturenumberβ€”model defaultSampling temperature
onToken(token: string) => voidβ€”β€”Called on each token
onComplete(text: string) => voidβ€”β€”Called with full text when done
onError(err: Error) => voidβ€”β€”Called on error

Returns: StreamCompletionResult

PropertyTypeDescription
textstringAccumulated full text so far
tokensstring[]Array of individual tokens received
isStreamingbooleanWhether the stream is currently active
start() => voidStart (or restart) the stream
abort() => voidAbort the current stream
errorError | nullLast error

useModelList(options?)

Discover available models on a local LLM runtime. Fetches automatically on mount.

const result = useModelList(); // defaults to Ollama
const result = useModelList({ endpoint: "http://localhost:1234", backend: "lmstudio" });

ModelListOptions:

OptionTypeDefaultDescription
endpointstring"http://localhost:11434"Base URL of the LLM server
backendBackendauto-detectedBackend type

Returns: ModelListResult

PropertyTypeDescription
modelsLocalModel[]Array of available models
isLoadingbooleanWhether the model list is loading
errorError | nullLast error
refresh() => voidRe-fetch the model list

LocalModel shape:

interface LocalModel {
  name: string;       // e.g. "gemma3:1b", "llama3.1:8b"
  size?: number;      // size in bytes
  modifiedAt?: string; // last modified timestamp
  digest?: string;     // model digest hash
}

Examples

Chat Interface

A complete chat UI with streaming, abort, and conversation management:

import { useState } from "react";
import { useOllama } from "use-local-llm";

function ChatApp() {
  const [input, setInput] = useState("");
  const { messages, send, isStreaming, abort, clear, error } = useOllama(
    "gemma3:1b",
    {
      system: "You are a friendly assistant. Keep responses concise.",
      temperature: 0.7,
    }
  );

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isStreaming) return;
    send(input);
    setInput("");
  };

  return (
    <div>
      <div>
        {messages.map((msg, i) => (
          <div key={i} style={{ margin: "8px 0" }}>
            <strong>{msg.role === "user" ? "You" : "AI"}:</strong>
            <p>{msg.content}</p>
          </div>
        ))}
      </div>

      {error && <p style={{ color: "red" }}>Error: {error.message}</p>}

      <form onSubmit={handleSubmit}>
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Type a message..."
          disabled={isStreaming}
        />
        <button type="submit" disabled={isStreaming || !input.trim()}>
          Send
        </button>
        {isStreaming && (
          <button type="button" onClick={abort}>
            Stop
          </button>
        )}
        <button type="button" onClick={clear}>
          Clear
        </button>
      </form>
    </div>
  );
}

Streaming Text Completion

Generate text with manual start/stop control:

import { useState } from "react";
import { useStreamCompletion } from "use-local-llm";

function TextGenerator() {
  const [prompt, setPrompt] = useState("Write a short poem about coding");
  const { text, isStreaming, start, abort, tokens } = useStreamCompletion({
    endpoint: "http://localhost:11434",
    model: "gemma3:1b",
    prompt,
  });

  return (
    <div>
      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        rows={3}
      />
      <div>
        <button onClick={start} disabled={isStreaming}>
          Generate
        </button>
        <button onClick={abort} disabled={!isStreaming}>
          Stop
        </button>
      </div>
      <pre>{text}</pre>
      <small>{tokens.length} tokens generated</small>
    </div>
  );
}

Model Selector

Let users pick from available models before chatting:

import { useState } from "react";
import { useModelList, useOllama } from "use-local-llm";

function ModelSelector() {
  const { models, isLoading, refresh } = useModelList();
  const [selectedModel, setSelectedModel] = useState("gemma3:1b");
  const { messages, send, isStreaming } = useOllama(selectedModel);

  if (isLoading) return <p>Loading models...</p>;

  return (
    <div>
      <select
        value={selectedModel}
        onChange={(e) => setSelectedModel(e.target.value)}
      >
        {models.map((m) => (
          <option key={m.name} value={m.name}>
            {m.name} {m.size ? `(${(m.size / 1e9).toFixed(1)} GB)` : ""}
          </option>
        ))}
      </select>
      <button onClick={refresh}>Refresh Models</button>

      {/* Chat UI */}
      {messages.map((msg, i) => (
        <p key={i}>
          <b>{msg.role}:</b> {msg.content}
        </p>
      ))}
      <button onClick={() => send("Hello!")} disabled={isStreaming}>
        Send
      </button>
    </div>
  );
}

Multi-turn Conversation with System Prompt

Build a specialized assistant with persistent system instructions:

import { useOllama } from "use-local-llm";

function CodeReviewer() {
  const { messages, send, isStreaming, clear } = useOllama("qwen2.5-coder:32b", {
    system: `You are an expert code reviewer. When given code:
1. Identify bugs and security issues
2. Suggest improvements
3. Rate code quality (1-10)
Keep responses structured and concise.`,
    temperature: 0.3,
  });

  const reviewCode = () => {
    send(`Review this code:
\`\`\`js
app.get('/user/:id', (req, res) => {
  const query = "SELECT * FROM users WHERE id = " + req.params.id;
  db.query(query, (err, result) => res.json(result));
});
\`\`\``);
  };

  return (
    <div>
      <button onClick={reviewCode} disabled={isStreaming}>Review Code</button>
      <button onClick={clear}>Clear</button>
      {messages.map((m, i) => (
        <div key={i}>
          <h4>{m.role}</h4>
          <pre>{m.content}</pre>
        </div>
      ))}
    </div>
  );
}

Token-by-Token Rendering

Use the onToken callback for real-time effects:

import { useState } from "react";
import { useOllama } from "use-local-llm";

function TypewriterChat() {
  const [tokenCount, setTokenCount] = useState(0);
  const [tokensPerSec, setTokensPerSec] = useState(0);
  const startTime = useState(() => ({ current: 0 }))[0];

  const { messages, send, isStreaming } = useOllama("gemma3:1b", {
    onToken: () => {
      if (startTime.current === 0) startTime.current = Date.now();
      setTokenCount((c) => c + 1);
      const elapsed = (Date.now() - startTime.current) / 1000;
      if (elapsed > 0) setTokensPerSec(Math.round(tokenCount / elapsed));
    },
    onResponse: () => {
      startTime.current = 0;
      setTokenCount(0);
    },
  });

  return (
    <div>
      {isStreaming && (
        <small>
          {tokenCount} tokens | {tokensPerSec} tok/s
        </small>
      )}
      {messages.map((m, i) => (
        <p key={i}>{m.content}</p>
      ))}
      <button onClick={() => send("Tell me a joke")} disabled={isStreaming}>
        Ask
      </button>
    </div>
  );
}

Using with LM Studio

LM Studio runs an OpenAI-compatible server on port 1234:

import { useLocalLLM } from "use-local-llm";

function LMStudioChat() {
  const { messages, send, isStreaming } = useLocalLLM({
    endpoint: "http://localhost:1234",
    // backend auto-detected as "lmstudio" from port
    model: "local-model", // Use the model name shown in LM Studio
    system: "You are a helpful assistant.",
  });

  return (
    <div>
      {messages.map((m, i) => (
        <p key={i}><b>{m.role}:</b> {m.content}</p>
      ))}
      <button onClick={() => send("Hello!")} disabled={isStreaming}>
        Send
      </button>
    </div>
  );
}

Using with llama.cpp

llama.cpp's built-in server runs on port 8080:

import { useLocalLLM } from "use-local-llm";

function LlamaCppChat() {
  const { messages, send, isStreaming } = useLocalLLM({
    endpoint: "http://localhost:8080",
    // backend auto-detected as "llamacpp" from port
    model: "default", // llama.cpp typically has one loaded model
    temperature: 0.8,
  });

  return (
    <div>
      {messages.map((m, i) => (
        <p key={i}><b>{m.role}:</b> {m.content}</p>
      ))}
      <button onClick={() => send("What is the meaning of life?")} disabled={isStreaming}>
        Ask
      </button>
    </div>
  );
}

Advanced Usage

Direct Stream Access (Non-React)

For Node.js scripts, CLI tools, or custom integrations, the streaming utilities are exported directly:

import { streamChat, streamGenerate } from "use-local-llm";

// Chat with message history
async function chat() {
  for await (const chunk of streamChat({
    endpoint: "http://localhost:11434",
    backend: "ollama",
    model: "gemma3:1b",
    messages: [
      { role: "system", content: "Be brief." },
      { role: "user", content: "What are React hooks?" },
    ],
  })) {
    process.stdout.write(chunk.content);
  }
}

// Simple text generation
async function generate() {
  for await (const chunk of streamGenerate({
    endpoint: "http://localhost:11434",
    backend: "ollama",
    model: "gemma3:1b",
    prompt: "Explain TypeScript in one sentence.",
  })) {
    process.stdout.write(chunk.content);
  }
}

Custom Abort Handling

Both streamChat and streamGenerate accept an AbortSignal for cancellation:

import { streamGenerate } from "use-local-llm";

const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

try {
  for await (const chunk of streamGenerate({
    endpoint: "http://localhost:11434",
    model: "gemma3:1b",
    prompt: "Write a very long essay...",
    signal: controller.signal,
  })) {
    process.stdout.write(chunk.content);
  }
} catch (err) {
  if (err.name === "AbortError") {
    console.log("\nGeneration cancelled.");
  }
}

Backend Auto-Detection

The library auto-detects the backend from the port number:

import { detectBackend } from "use-local-llm";

detectBackend("http://localhost:11434"); // β†’ "ollama"
detectBackend("http://localhost:1234");  // β†’ "lmstudio"
detectBackend("http://localhost:8080");  // β†’ "llamacpp"
detectBackend("http://myserver:9000");   // β†’ "openai-compatible"

You can override auto-detection with the backend option on any hook.

Endpoint Presets

import { ENDPOINTS, CHAT_PATHS, GENERATE_PATHS, MODEL_LIST_PATHS } from "use-local-llm";

ENDPOINTS.ollama;   // { url: "http://localhost:11434", backend: "ollama" }
ENDPOINTS.lmstudio; // { url: "http://localhost:1234", backend: "lmstudio" }
ENDPOINTS.llamacpp; // { url: "http://localhost:8080", backend: "llamacpp" }

CHAT_PATHS.ollama;            // "/api/chat"
CHAT_PATHS["openai-compatible"]; // "/v1/chat/completions"

CORS Configuration

When calling local LLM servers from a browser, CORS must be enabled on the server:

Ollama

Set the OLLAMA_ORIGINS environment variable before starting:

# macOS
OLLAMA_ORIGINS="*" ollama serve

# Or set persistently
launchctl setenv OLLAMA_ORIGINS "*"

LM Studio

CORS is enabled by default. No configuration needed.

llama.cpp

Start the server with the --host flag:

./server -m model.gguf --host 0.0.0.0 --port 8080

TypeScript Reference

All types are exported for use in your application:

import type {
  // Core types
  Backend,            // "ollama" | "lmstudio" | "llamacpp" | "openai-compatible"
  ChatMessage,        // { role: "system" | "user" | "assistant", content: string }
  StreamChunk,        // { content: string, done: boolean, model?: string }
  EndpointConfig,     // { url: string, backend: Backend }
  LocalModel,         // { name, size?, modifiedAt?, digest? }

  // Hook options
  LocalLLMOptions,
  OllamaOptions,
  StreamCompletionOptions,
  ModelListOptions,

  // Hook return types
  LocalLLMResult,
  StreamCompletionResult,
  ModelListResult,
} from "use-local-llm";

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Your React App                β”‚
β”‚                                                 β”‚
β”‚  useOllama("gemma3:1b")                         β”‚
β”‚       β”‚                                         β”‚
β”‚       β–Ό                                         β”‚
β”‚  useLocalLLM({ endpoint, model, ... })          β”‚
β”‚       β”‚                                         β”‚
β”‚       β–Ό                                         β”‚
β”‚  streamChat() / streamGenerate()                β”‚
β”‚       β”‚         async generators                β”‚
β”‚       β–Ό                                         β”‚
β”‚  parseStreamChunk()                             β”‚
β”‚       β”‚         NDJSON + SSE parser              β”‚
β”‚       β–Ό                                         β”‚
β”‚  fetch() + ReadableStream                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚  HTTP (no server in between)
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Ollama :11434      β”‚
β”‚  LM Studio :1234    β”‚
β”‚  llama.cpp :8080    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key design decisions:

  • No server required β€” hooks call localhost directly via fetch()
  • Async generators β€” streamChat() and streamGenerate() yield StreamChunk objects, making them composable and testable outside React
  • AbortController β€” every stream can be cancelled immediately; user-initiated aborts don't trigger error states
  • Zero dependencies β€” only React as a peer dependency; the entire package is 2.8 KB gzipped

Comparison with Vercel AI SDK

Featureuse-local-llmVercel AI SDK (ai)
Browser β†’ localhostβœ… Direct❌ Requires API route
Server required❌ Noneβœ… Node.js server
Ollama supportβœ… Built-in⚠️ Via server provider
LM Studio supportβœ… Built-in⚠️ Via server provider
llama.cpp supportβœ… Built-in❌ Not officially supported
Multi-backendβœ… Auto-detected⚠️ Manual server setup per provider
Bundle size2.8 KB gzip~50 KB+
Cloud LLMs (OpenAI, etc.)❌ Local onlyβœ… Full support
Production server features❌ Not the goalβœ… Rate limiting, auth, etc.

When to use this library: Prototyping, local development, privacy-sensitive apps, air-gapped environments, hackathons, dev tools.

When to use Vercel AI SDK: Production apps, cloud LLMs, apps requiring server-side auth, rate limiting, or logging.

Tested Models

This library has been live-tested against these models on Ollama:

ModelStatusNotes
gemma3:1bβœ… VerifiedFast responses, great for prototyping
llama3.1:8bβœ… AvailableGood general-purpose model
qwen2.5:latestβœ… AvailableStrong multilingual support
qwen2.5-coder:32bβœ… AvailableBest for code generation
deepseek-r1:latestβœ… AvailableReasoning model
deepseek-coder-v2:latestβœ… AvailableCode-focused

Any model available via ollama list will work. The library is model-agnostic.

Contributing

Contributions are welcome! Please:

  • Fork the repository
  • Create a feature branch (git checkout -b feature/my-feature)
  • Write tests for new functionality
  • Ensure all tests pass (npm test)
  • Ensure TypeScript compiles (npm run typecheck)
  • Submit a pull request

Development Setup

git clone https://github.com/pooyagolchian/use-local-llm.git
cd use-local-llm
npm install
npm run dev        # Watch mode
npm test           # Run tests
npm run typecheck  # Type check
npm run build      # Production build

Live Testing

To run integration tests against a running Ollama instance:

npx tsx scripts/test-live.ts

License

MIT Β© Pooya Golchian

Keywords

react

FAQs

Package last updated on 08 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts