
Research
/Security News
Mini Shai-Hulud Campaign Hits Red Hat Cloud Services npm Packages
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.
@web-ai-sdk/prompt
Advanced tools
Building block for the Web's Built-in Prompt API (LanguageModel). One-shot ask() for embeds and widgets, plus a thin createSession() primitive (and React useSession) for chat-shaped apps that need independent per-conversation sessions and delta-shaped streaming. The wrapper smooths cross-browser quirks (delta-vs-cumulative chunks, output sanitization, abort wiring); UI state and conversation history are the consumer's concern.
Docs: https://web-ai-sdk.dev/docs/guides/prompt/ · React: usePrompt · useSession
Prompt API ships stable in Chrome 148+ — no flag required. Chrome 138–147 still works with chrome://flags/#prompt-api-for-gemini-nano enabled. On Edge it remains a developer preview in Canary/Dev 138+ behind edge://flags/#prompt-api-for-phi-mini, with Phi-4-mini's stricter safety pipeline often refusing output (see Browser support). On any other browser this library is a no-op for the React hook (it stays in "unavailable"). The vanilla ask() throws PromptUnavailableError so callers can branch explicitly.
pnpm add @web-ai-sdk/prompt
# or: npm i @web-ai-sdk/prompt / bun add @web-ai-sdk/prompt
The React adapter ships as a subpath export, with no extra install. react is a peer dependency only when you import the /react entry.
ask()import { ask } from "@web-ai-sdk/prompt";
const result = await ask({
input: "Summarize this in one sentence: WebMCP lets web pages expose tools to agents.",
systemPrompt: "You are concise. Reply with a single sentence.",
temperature: 0.2,
onUpdate: (text) => console.log("partial", text), // cumulative buffer
});
console.log(result.response, result.cached);
ask() shares a warm LanguageModel instance across same-shape callers so the cold start is paid once per persona. That's right for embeds, widgets, ask-and-display flows. It's the wrong shape for chat: two callers with the same mode would share one instance, so conversation history cross-bleeds and abort() on one caller kills the other.
createSession()import { createSession } from "@web-ai-sdk/prompt";
const session = createSession({
systemPrompt: "You are a helpful assistant.",
temperature: 0.7,
});
// Streaming, yields DELTA chunks (not cumulative buffers):
for await (const delta of session.sendStreaming("Tell me about WebMCP.")) {
process.stdout.write(delta);
}
// Or one-shot per turn:
const text = await session.send("And what about the Prompt API?");
// Tear down explicitly when the conversation ends.
session.destroy();
Every createSession() call returns an independent LanguageModelInstance with its own history, system prompt, sampling, and lifecycle — abort() / destroy() on one session never touch another. Concurrent send / sendStreaming calls on the same session are NOT queued — the underlying LanguageModel is sequential per instance and will reject the overlapping call with InvalidStateError. Either await the previous send or call session.abort() before issuing a new turn. Multi-turn conversation context is tracked by the native instance itself; UI message lists are your data model.
Concurrency note. Each session is an independent LanguageModel instance: independent history, system prompt, sampling, and lifecycle. The underlying on-device model is single-instance, so the browser currently schedules sendStreaming calls across sessions FIFO. Overlapping sends do not interleave token-by-token in Chrome 148 / Edge 138 — the second send waits for the first to drain. This is a constraint of the runtime, not of the API; code written against createSession() becomes faster automatically if a future release exposes parallel inference.
usePromptimport { usePrompt } from "@web-ai-sdk/prompt/react";
export function AskBox() {
const { status, response, error, ask, abort } = usePrompt({
systemPrompt: "You are a helpful assistant. Be concise.",
temperature: 0.7,
});
if (status === "unavailable") return null;
return (
<form
onSubmit={(e) => {
e.preventDefault();
const input = new FormData(e.currentTarget).get("q") as string;
if (input) ask(input);
}}
>
<input name="q" placeholder="Ask me anything" />
<button type="submit" disabled={status === "loading" || status === "streaming"}>
{status === "streaming" ? "Streaming…" : "Ask"}
</button>
{response && <p>{response}</p>}
{error && <small>{error.message}</small>}
</form>
);
}
State machine: idle | loading | streaming | done | unavailable. ask(input) triggers a request, cancels any in-flight one, and updates response as chunks stream.
useSessionimport { useSession } from "@web-ai-sdk/prompt/react";
import { useState } from "react";
export function Chat({ persona }: { persona: string }) {
const { status, session } = useSession({ systemPrompt: persona });
const [response, setResponse] = useState("");
if (status === "unavailable" || !session) return null;
const send = async (text: string) => {
setResponse("");
let buffer = "";
for await (const delta of session.sendStreaming(text)) {
buffer += delta;
setResponse(buffer);
}
};
return (
<form onSubmit={(e) => { e.preventDefault(); send("Hello"); }}>
<button type="submit">Send</button>
<button type="button" onClick={() => session.abort()}>Stop</button>
<p>{response}</p>
</form>
);
}
useSession is lifecycle-only: it creates the session on mount, destroys it on unmount, and recreates it when any primitive option changes. It deliberately does not track response / history / streaming status — that's your UI state, you own it. Each useSession() call owns its own underlying LanguageModelInstance, so component state and abort() / destroy() stay scoped to the owning component. Token-level interleaving across sessions is browser-defined (see the Concurrency note above) — N mounted components in Chrome 148 / Edge 138 still drain through one underlying model FIFO.
ask(options): Promise<AskResult>interface AskOptions {
input: string;
systemPrompt?: string;
temperature?: number;
topK?: number;
language?: string; // BCP-47 hint, folded into expectedInputs/Outputs
supportedLanguages?: readonly string[]; // default ["en"]
expectedInputs?: LanguageModelExpectedInput[]; // advanced passthrough
expectedOutputs?: LanguageModelExpectedOutput[]; // advanced passthrough
tools?: LanguageModelTool[]; // experimental: native function-calling passthrough
createOptions?: Partial<LanguageModelCreateOptions>;
responseConstraint?: object; // JSON Schema for structured output
cache?: ResponseCache;
cacheKey?: string;
onUpdate?: (text: string) => void; // CUMULATIVE buffer
signal?: AbortSignal;
}
interface AskResult {
response: string | null;
cached: boolean;
}
onUpdate receives the cumulative text so far, not deltas. For delta-shaped streaming use createSession().sendStreaming().
If systemPrompt is passed alongside createOptions.initialPrompts, the SDK emits a one-shot console.warn because initialPrompts overrides the synthesized system prompt and the persona is silently lost.
createSession(options?): Sessioninterface CreateSessionOptions {
systemPrompt?: string;
temperature?: number;
topK?: number;
language?: string;
supportedLanguages?: readonly string[];
expectedInputs?: LanguageModelExpectedInput[];
expectedOutputs?: LanguageModelExpectedOutput[];
tools?: LanguageModelTool[]; // experimental: native function-calling passthrough
// Pass `initialPrompts` here to seed multi-turn context.
createOptions?: Partial<LanguageModelCreateOptions>;
}
interface SessionSendOptions {
signal?: AbortSignal;
responseConstraint?: object; // JSON Schema for structured output
omitResponseConstraintInput?: boolean; // drop the inlined schema to save tokens
}
interface Session {
readonly destroyed: boolean;
readonly contextWindow?: number; // context window in tokens; undefined pre-creation
readonly contextUsage?: number; // tokens used so far; undefined pre-creation
send(input: string, options?: SessionSendOptions): Promise<string | null>;
sendStreaming(input: string, options?: SessionSendOptions): AsyncIterable<string>;
abort(): void;
clone(options?: { signal?: AbortSignal }): Promise<Session>;
onContextOverflow(listener: () => void): () => void; // returns an idempotent cleanup
destroy(): void;
}
Session.sendStreaming() yields deltas (each chunk is the new text since the last yield, never cumulative). The wrapper does no extra bookkeeping: no history tracking, no concurrent-send queue, no usage telemetry. Always destroy sessions you no longer need.
omitResponseConstraintInput is only forwarded when responseConstraint is also set; the native API throws a TypeError otherwise. When you omit the schema, include format guidance in the prompt text itself (the model no longer sees the schema).
The Prompt API spec defines native function calling: register tools on the session and the runtime invokes their execute on the model's behalf, feeding results back. ask() and createSession() forward a tools array straight through to LanguageModel.create():
import { createSession, type LanguageModelTool } from "@web-ai-sdk/prompt";
const tools: LanguageModelTool[] = [
{
name: "fetch_url",
description: "Fetch a URL and return its text.",
inputSchema: {
type: "object",
properties: { url: { type: "string" } },
required: ["url"],
},
async execute(args) {
const { url } = args as { url: string };
return await (await fetch(url)).text();
},
},
];
const session = createSession({ systemPrompt, tools });
This is pass-through only: the SDK forwards tools and never calls execute itself. Whether the model actually invokes a tool depends on the browser. Native execution is not wired on current stable Chrome — the option is accepted but is a silent no-op, and the model may surface its tool call as plain text (a tool_code block) that your code must parse. The passthrough begins working automatically on browsers that ship native execution; until then, responseConstraint remains the robust default. The heuristic tool_code parser and the tool-execution loop are deliberately left in the consumer layer.
tools works on ask() too (ask({ input, tools })), with one caveat: ask() shares warm sessions through an LRU keyed by JSON.stringify(createOptions), and JSON.stringify drops functions — so a tool's execute doesn't contribute to the key, only its name / description / inputSchema do. Two ask() calls with identical tool metadata but different execute closures would share one cached session. It's harmless today (the SDK never runs execute), but it matters once native execution lands, so prefer createSession() for tool-bearing sessions — it bypasses the cache and matches the base-session + per-run-clone() pattern.
To declare the native tool modalities, pass them through the advanced expectedInputs / expectedOutputs fields ({ type: "tool-response" } / { type: "tool-call" }).
clone()For agents and multi-task flows, reusing one long-lived session lets history accumulate (later runs "echo" earlier ones, and you eventually hit QuotaExceededError), while recreating a session per task pays the cold start and can hit Chrome's single-instance degradation. The spec's recommended pattern is to keep one warm base session (system prompt only) and clone() it per task: the clone inherits the system prompt and history without re-parsing or another create(), then gets independent history and lifecycle.
const base = createSession({ systemPrompt }); // once; keep warm
// per task / run:
const turn = await base.clone(); // fresh history, no re-parse
try {
for await (const delta of turn.sendStreaming(input)) render(delta);
} finally {
turn.destroy(); // free the clone, keep base
}
clone() throws SessionDestroyedError if the base is destroyed and PromptUnavailableError if the browser instance doesn't support cloning. Destroying a clone never affects the base, and vice versa.
Session surfaces the live token budget the native instance reports, so consumers can size work to the actual context window instead of hardcoding a char cap. Both are undefined until the underlying instance exists — the instance is created lazily on the first send / sendStreaming, so read them after a send or (cleaner) on a session from clone(), whose instance is live the moment clone() resolves.
session.contextWindow — max input tokens for the session (the context window).session.contextUsage — input tokens used so far. On a fresh base-clone this reflects the inherited history (≈ the system prompt), the right baseline to budget a turn against.These mirror the Prompt API's contextWindow / contextUsage (the renamed successors of inputQuota / inputUsage); the wrapper reads the new names and falls back to the deprecated ones on older Chrome builds.
const base = createSession({ systemPrompt }); // keep warm
const turn = await base.clone(); // instance is live here
const quota = turn.contextWindow; // e.g. 4096 / 6144 tokens
const used = turn.contextUsage ?? 0; // ≈ system prompt
if (quota) {
const available = quota - used - ANSWER_RESERVE_TOKENS;
const budgetChars = Math.max(0, available) * 4; // ~4 chars/token
// truncate fetched content to budgetChars so it fits in one turn
}
// Fall back to a fixed char cap when contextWindow is undefined
// (older browsers / pre-creation).
session.onContextOverflow(listener) subscribes to the native contextoverflow event, which fires when a turn pushes usage past the window and the oldest history is dropped. Use it to compact or fork a fresh clone() before hitting QuotaExceededError. It returns an idempotent cleanup function, and is a no-op (returns a no-op cleanup) when the instance doesn't expose the event.
const stop = session.onContextOverflow(() => {
// compact, summarize, or start a fresh clone before QuotaExceededError
});
// later
stop();
useSession(options?): UseSessionReturninterface UseSessionReturn {
status: "loading" | "ready" | "unavailable";
error: Error | null;
session: Session | null; // null until status === "ready"
}
Lifecycle-only: feature detection + create + destroy on unmount + recreate when any primitive option (systemPrompt, temperature, topK, language) changes. Object options (expectedInputs, createOptions) participate by reference; memoize them or accept the recreate cost. UI state is your concern — iterate session.sendStreaming() and accumulate text into your own component state.
isPromptAvailable(): booleanFeature-detect helper.
checkAvailability(opts?): Promise<LanguageModelAvailability | null>Forwards to LanguageModel.availability(). Returns null if the global is missing or the call throws.
createSessionStorageCache({ storage?, prefix? }): ResponseCacheOptional cache backend. Pass it to ask({ cache }) to enable response caching, with an optional custom storage (e.g. localStorage, an in-memory polyfill).
import {
clearSessions, // drop every cached one-shot session
clearSession, // drop one cached session by create-options
configurePromptCache, // change the LRU cap (default 8)
} from "@web-ai-sdk/prompt";
The internal session cache is LRU-bounded (default 8) and only memoizes sessions created by ask(); createSession() is never cached.
getLanguageModelApi, getOrCreateLanguageModel, defaultCacheKey; exported so you can compose your own pipeline.
Two layers, same as @web-ai-sdk/summarizer:
ask() only): a bounded LRU of LanguageModel instances keyed by stringified create-options. Cold-start ≈ 1-3s; warm calls are sub-second. createSession() bypasses this cache entirely.cache (anything matching { get, set }) to memoize final responses by (input, systemPrompt, temperature, topK). Omit it for a fresh model call every time.// Off by default; every call hits the model.
ask({ input: "hi" });
// Opt in for sessionStorage-backed caching.
ask({ input: "hi", cache: createSessionStorageCache() });
// Or roll your own.
ask({ input: "hi", cache: myMap, cacheKey: "greeting" });
The vanilla ask() throws PromptUnavailableError when the API is missing or reports availability: "unavailable". The React hook absorbs this and returns status: "unavailable" instead.
createSession() returns a Session synchronously even if the underlying create() rejects; the error surfaces on the first send / sendStreaming.
AbortSignal is supported on every surface. Aborting mid-stream resolves cleanly; the result cache is not written for aborted runs. Aborts reject with PromptAbortError (exported; instanceof PromptAbortError works, and its name is "AbortError"), thrown by both ask() and sessions.
MIT © Beto Muniz
FAQs
Building block for the Web's Built-in Prompt API
We found that @web-ai-sdk/prompt demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
A mini Shai-Hulud campaign compromised Red Hat Cloud Services npm packages to steal developer and CI/CD secrets during installation.

Research
/Security News
The North Korean malware loader hides in a Packagist-listed package and its GitHub branch to fetch and execute remote code in a likely Contagious Interview-style lure.

Security News
The Rust project is moving toward formal rules on LLM use in contributions after months of internal debate over maintainer burden, code quality, and contributor experience.