
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
@classytic/realtime-agents
Advanced tools
Provider-agnostic realtime voice agent orchestration for React (OpenAI Realtime, Gemini Live)
Provider-agnostic realtime voice agent orchestration for React.
Build voice-powered AI agents that work with OpenAI Realtime API and Google Gemini Live API using a single, unified interface.
sendImageNative multi-agent handoffs are currently an OpenAI Agents SDK capability. Gemini Live can share the same tool and session abstraction in this package, but OpenAI remains the only provider here with built-in agent-to-agent transfer semantics.
npm install @classytic/realtime-agents
Peer dependencies (install the ones you need):
# For OpenAI Realtime
npm install @openai/agents
# For Gemini Live
npm install @google/genai
# Required
npm install react zod
import { useRealtimeSession, tool } from "@classytic/realtime-agents";
import {
OpenAIAdapter,
OPENAI_LOW_LATENCY_SESSION_OPTIONS,
OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG,
OPENAI_SPEECH_STYLE_HINTS,
openAIAgentOptions,
openAISessionOptions,
} from "@classytic/realtime-agents/openai";
import { z } from "zod";
const adapter = useMemo(
() =>
new OpenAIAdapter({
sessionOptions: OPENAI_LOW_LATENCY_SESSION_OPTIONS,
transcriptionLanguage: "en",
transcriptionModel: OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG.model,
transcriptionPrompt: OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG.prompt,
}),
[],
);
// Defaults: WebRTC, gpt-realtime, retention_ratio 0.8, plus low-latency session preset
const session = useRealtimeSession(adapter, {
onTranscriptComplete: (entry) => console.log(entry.role, entry.text),
onError: (err) => console.error(err),
});
// Connect
await session.connect({
getCredentials: async () => {
const res = await fetch("/api/session");
const data = await res.json();
return data.client_secret.value;
},
agent: {
name: "assistant",
instructions: `You are a helpful assistant. ${OPENAI_SPEECH_STYLE_HINTS["en-IN"]}`,
tools: [weatherTool],
voice: "coral",
providerOptions: openAIAgentOptions({
prompt: {
promptId: "pmpt_123",
version: "3",
variables: { locale: "en-IN" },
},
}),
},
providerOptions: openAISessionOptions({
...OPENAI_LOW_LATENCY_SESSION_OPTIONS,
workflowName: "interview-session",
}),
});
// Controls
session.sendMessage("Hello!");
session.sendImage(canvasDataUrl); // Send image to the model
session.mute(true); // User mute — blocks auto-unmute
session.mute(true, { source: "system" }); // System mute — auto-unmute still works
session.interrupt();
session.disconnect();
import { useRealtimeSession } from "@classytic/realtime-agents";
import {
GeminiAdapter,
GEMINI_EN_IN_SPEECH_CONFIG,
GEMINI_LOW_LATENCY_SESSION_OPTIONS,
geminiSessionOptions,
} from "@classytic/realtime-agents/gemini";
const adapter = useMemo(
() => new GeminiAdapter({ sessionOptions: GEMINI_LOW_LATENCY_SESSION_OPTIONS }),
[],
);
// Defaults: gemini-2.5-flash-native-audio-preview-12-2025, transcription on, sliding window compression
const session = useRealtimeSession(adapter, {
/* same callbacks */
});
await session.connect({
getCredentials: async () => {
const res = await fetch("/api/gemini-session");
return (await res.json()).apiKey;
},
agent: {
name: "assistant",
instructions: "You are a helpful assistant.",
tools: [weatherTool],
voice: "Kore",
},
providerOptions: geminiSessionOptions({
speechConfig: GEMINI_EN_IN_SPEECH_CONFIG,
explicitVadSignal: true,
// Escape hatch for newer Gemini Live config without waiting for a wrapper release
config: {},
}),
});
type SessionStatus = "disconnected" | "connecting" | "connected";
type AgentStatus = "idle" | "listening" | "speaking" | "thinking";
agentStatus reflects what the voice agent is currently doing and is useful for driving UI visualizations (e.g. animated orbs, status indicators).
Tools use Zod schemas and work identically across providers:
import { tool } from "@classytic/realtime-agents";
import { z } from "zod";
const weatherTool = tool({
name: "get_weather",
description: "Get current weather for a city",
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => {
const res = await fetch(`/api/weather?city=${city}`);
return res.json();
},
});
Tools that need access to the active session (e.g. to send images or read session context) can use the global tool context:
import { getToolContext } from "@classytic/realtime-agents";
const captureImageTool = tool({
name: "capture_image",
description: "Capture and analyze the current camera frame",
parameters: z.object({}),
execute: async () => {
const ctx = getToolContext();
// Access session context passed during connect
const interviewId = ctx.sessionContext.interviewId;
// Send an image through the active session
ctx.sendImage(dataUrl, { triggerResponse: true });
// Access the raw provider session (escape hatch)
const rawSession = ctx.providerSession;
return { captured: true };
},
});
The tool context is automatically set during connect() and cleared on disconnect(). The sessionContext object comes from the context option passed to connect().
await session.connect({
// Required: returns API key or client secret
getCredentials: async () => "...",
// Required: agent configuration
agent: {
name: "assistant",
instructions: "You are a helpful assistant.",
tools: [weatherTool],
voice: "coral",
providerOptions: openAIAgentOptions({
prompt: {
promptId: "pmpt_123",
variables: { locale: "en-IN" },
},
}),
},
// Optional: HTML audio element for playback (WebRTC)
audioElement: document.getElementById("audio"),
// Optional: session context accessible via getToolContext()
context: { interviewId: "123", candidateName: "Alice" },
// Optional: pass an existing MediaStream (e.g. with video tracks)
mediaStream: cameraStream,
// Optional: pre-seed conversation history
history: [
{ role: "user", text: "My name is Alice." },
{ role: "assistant", text: "Nice to meet you, Alice!" },
],
// Optional: provider-native session options
providerOptions: openAISessionOptions({
workflowName: "interview-session",
sessionConfig: OPENAI_LOW_LATENCY_SESSION_OPTIONS.sessionConfig,
}),
});
The useRealtimeSession hook returns these controls:
| Method | Description |
|---|---|
connect(options) | Connect to the voice session |
disconnect() | Disconnect and clean up |
sendMessage(text) | Send a text message to the agent |
sendImage(dataUrl, options?) | Send an image to the model. triggerResponse (default false) makes the model respond to the image |
sendSimulatedUserMessage(text) | Inject a synthetic user message into the conversation (appears as if the user said it) |
mute(muted, options?) | Mute/unmute the microphone. Pass { source: 'system' } for programmatic mutes that shouldn't block auto-unmute in non-interruptible mode (default: 'user') |
interrupt() | Interrupt the agent's current response |
pushToTalkStart() | Begin push-to-talk recording |
pushToTalkStop() | End push-to-talk recording |
sendEvent(event) | Send a raw transport event (provider-specific) |
getUsage() | Get a snapshot of token usage |
Plus reactive state:
| Property | Type | Description |
|---|---|---|
status | SessionStatus | 'disconnected' | 'connecting' | 'connected' |
agentStatus | AgentStatus | 'idle' | 'listening' | 'speaking' | 'thinking' |
usage | UsageInfo | null | Real-time token consumption (updates as tokens are consumed) |
const session = useRealtimeSession(adapter, {
onStatusChange: (status) => {}, // Session connected/disconnected
onAgentStatusChange: (status) => {}, // Agent idle/listening/speaking/thinking
onError: (error) => {}, // Connection or runtime errors
onTranscriptComplete: (entry) => {}, // Finalized transcript (user or assistant)
onAgentHandoff: (agentName) => {}, // Agent-to-agent handoff
onUserSpeechStart: () => {}, // User started speaking
onUserSpeechStop: () => {}, // User stopped speaking
onToolStart: (toolName, args) => {}, // Tool invocation started
onToolEnd: (toolName, result) => {}, // Tool invocation finished
onUsageUpdate: (usage) => {}, // Token usage changed
onTransportEvent: (event) => {}, // Raw provider transport event
onGuardrailTripped: (result) => {}, // Guardrail result (OpenAI)
onToolApprovalRequest: async (name, args) => true, // Approve/reject tool calls
});
@classytic/realtime-agents/openai exports reusable presets and typed helpers:
import {
OPENAI_DEFAULT_CONTEXT_MANAGEMENT,
OPENAI_LOW_LATENCY_CONTEXT_MANAGEMENT,
OPENAI_DEFAULT_SESSION_OPTIONS,
OPENAI_LOW_LATENCY_SESSION_OPTIONS,
OPENAI_DEFAULT_TURN_DETECTION,
OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG,
OPENAI_SPEECH_STYLE_HINTS,
openAIAgentOptions,
openAISessionOptions,
} from "@classytic/realtime-agents/openai";
@classytic/realtime-agents/gemini exports equivalent session helpers:
import {
GEMINI_DEFAULT_CONTEXT_MANAGEMENT,
GEMINI_LOW_LATENCY_CONTEXT_MANAGEMENT,
GEMINI_DEFAULT_SESSION_OPTIONS,
GEMINI_LOW_LATENCY_SESSION_OPTIONS,
geminiSessionOptions,
} from "@classytic/realtime-agents/gemini";
Use useAutoReconnect as a drop-in replacement for useRealtimeSession to automatically recover from unexpected disconnects:
import { useAutoReconnect } from "@classytic/realtime-agents";
import { OpenAIAdapter } from "@classytic/realtime-agents/openai";
const adapter = useMemo(() => new OpenAIAdapter(), []);
const session = useAutoReconnect(
adapter,
{
// All standard SessionCallbacks, plus:
onReconnecting: (attempt, max) =>
toast.info(`Reconnecting ${attempt}/${max}...`),
onReconnected: () => toast.success("Reconnected!"),
onReconnectFailed: () => toast.error("Connection lost. Please refresh."),
},
{
maxAttempts: 3, // default: 3
baseDelay: 1000, // default: 1000ms
maxDelay: 8000, // default: 8000ms
injectHistory: true, // default: true — re-inject transcript on reconnect
},
);
// Same API as useRealtimeSession, plus:
session.isReconnecting; // boolean
session.reconnectAttempt; // 0 when not reconnecting
How it works:
disconnect() was called intentionally vs the connection dropped unexpectedlyadapter.prepareReconnect() which sets the session resumption handle — the reconnected session picks up where it left off server-sidehistory on reconnect — the agent has conversational context (audio context is lost, but transcript-based context is preserved)isReconnecting and reconnectAttempt for UI feedback (e.g. loading spinners, toast notifications)| Config | Default | Description |
|---|---|---|
maxAttempts | 3 | Number of retry attempts before giving up |
baseDelay | 1000 | Initial delay in ms before first retry |
maxDelay | 8000 | Maximum delay with exponential backoff |
injectHistory | true | Re-inject transcript history on reconnect |
Wrap your app with EventProvider and TranscriptProvider for shared transcript/event state:
import { EventProvider, TranscriptProvider } from "@classytic/realtime-agents";
function App() {
return (
<EventProvider>
<TranscriptProvider>
<VoiceAgent />
</TranscriptProvider>
</EventProvider>
);
}
Both adapters default to retentionRatio: 0.8 for long-running sessions. Override per-adapter:
// OpenAI: keep 60% of context on truncation
new OpenAIAdapter({ contextManagement: { retentionRatio: 0.6 } });
// Gemini: custom trigger threshold
new GeminiAdapter({
contextManagement: { triggerTokens: 80000, retentionRatio: 0.5 },
});
// Disable context management (not recommended for long sessions)
new OpenAIAdapter({ contextManagement: { mode: "disabled" } });
| Option | Default | Description |
|---|---|---|
transport | 'webrtc' | 'webrtc' or 'websocket' |
codec | 'opus' | Audio codec for WebRTC |
model | 'gpt-realtime' | OpenAI model identifier |
transcriptionModel | 'gpt-4o-mini-transcribe' | Transcription model |
transcriptionLanguage | undefined | Optional language hint for speech recognition |
transcriptionPrompt | undefined | Optional jargon/name hint for speech recognition |
vadEagerness | 'medium' | Voice activity detection: 'low', 'medium', 'high', or 'auto' |
contextManagement | { mode: 'auto', retentionRatio: 0.8 } | Context window management |
gpt-4o-mini-transcribe: fastest and cheapest. Good default for most voice UX.gpt-4o-transcribe: slower than mini but more accurate. Better for interviews, accents, names, and technical jargon.gpt-4o-transcribe-latest: use when you want OpenAI's latest transcribe alias instead of a pinned choice.Example:
import {
OpenAIAdapter,
OPENAI_HIGH_ACCURACY_SESSION_OPTIONS,
OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG,
} from "@classytic/realtime-agents/openai";
const adapter = new OpenAIAdapter({
sessionOptions: OPENAI_HIGH_ACCURACY_SESSION_OPTIONS,
transcriptionModel: OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG.model,
transcriptionPrompt: OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG.prompt,
});
| Option | Default | Description |
|---|---|---|
model | 'gemini-live-2.5-flash-preview' | Gemini Live model |
inputSampleRate | 16000 | Input audio sample rate (Hz) |
outputSampleRate | 24000 | Output audio sample rate (Hz) |
inputTranscription | true | Transcribe user speech |
outputTranscription | true | Transcribe model speech |
enableVideo | false | Request camera in getUserMedia |
videoFrameInterval | 5000 | Ms between video frame captures (0 to disable) |
sessionResumption | -- | { handle?, transparent? } — resume a previous session |
contextManagement | { mode: 'auto', retentionRatio: 0.8 } | Sliding window compression |
Both adapters expose getOutputAnalyser() which returns an AnalyserNode for the AI's audio output. Use this to drive audio visualizations:
const adapter = useMemo(() => new OpenAIAdapter(), []);
// After connect, get the analyser node
const analyser = adapter.getOutputAnalyser(); // AnalyserNode | null
// Reactive state (updates in real-time)
const { usage } = session;
console.log(usage?.inputTokens, usage?.outputTokens, usage?.totalTokens);
// Granular breakdown: usage?.inputTokensDetails, usage?.outputTokensDetails
// OpenAI also exposes usage?.requests when available
// Advanced analytics: usage?.rawUsage contains the provider-native payload
// Snapshot (e.g., before disconnect)
const snapshot = session.getUsage();
UsageInfo is the unified token accounting shape across providers:
inputTokensoutputTokenstotalTokensrequests?inputTokensDetails?outputTokensDetails?rawUsage?OpenAI currently provides the richest usage detail. Gemini Live currently provides reliable top-level token counts and exposes the provider payload through rawUsage.
@classytic/realtime-agents)| Export | Type | Description |
|---|---|---|
useRealtimeSession | Hook | Main React hook for voice sessions |
useAutoReconnect | Hook | Drop-in replacement with auto-reconnection on unexpected disconnects |
useSessionHistory | Hook | Session history management |
tool | Function | Create provider-agnostic tool definitions |
buildInstructions | Function | Template {{variable}} replacement for dynamic instructions |
getToolContext | Function | Get the current session's tool context (for tools that need session access) |
setToolContext | Function | Set the tool context (called automatically during connect) |
clearToolContext | Function | Clear the tool context (called automatically during disconnect) |
EventProvider | Component | Event context provider |
TranscriptProvider | Component | Transcript context provider |
useEvent | Hook | Access event context |
useTranscript | Hook | Access transcript context |
SessionStatus, AgentStatus, TranscriptEntry, AgentTool, AgentConfig, HistoryEntry, ConnectOptions, TransportEventHandlers, RealtimeAdapter, SessionCallbacks, UseRealtimeSessionReturn, UsageInfo, ContextManagement, ReconnectConfig, AutoReconnectCallbacks, UseAutoReconnectReturn, ToolContext, ModerationCategory, GuardrailResultType, TranscriptItem, LoggedEvent
// OpenAI
import {
OpenAIAdapter,
OPENAI_VOICES,
OPENAI_DEFAULT_VOICE,
OPENAI_REALTIME_MODELS,
OPENAI_DEFAULT_MODEL,
OPENAI_TRANSCRIPTION_MODELS,
OPENAI_DEFAULT_TRANSCRIPTION_MODEL,
OPENAI_DEFAULT_TRANSCRIPTION_CONFIG,
OPENAI_HIGH_ACCURACY_TRANSCRIPTION_CONFIG,
OPENAI_INTERVIEW_TRANSCRIPTION_CONFIG,
OPENAI_DEFAULT_SESSION_OPTIONS,
OPENAI_HIGH_ACCURACY_SESSION_OPTIONS,
OPENAI_LOW_LATENCY_SESSION_OPTIONS,
OPENAI_TRANSPORTS,
OPENAI_DEFAULT_TRANSPORT,
} from "@classytic/realtime-agents/openai";
import type {
OpenAIAdapterOptions,
OpenAIVoiceOption,
OpenAIVoiceId,
OpenAIRealtimeModel,
OpenAIRealtimeModelId,
OpenAITransport,
} from "@classytic/realtime-agents/openai";
// Gemini
import {
GeminiAdapter,
GEMINI_VOICES,
GEMINI_DEFAULT_VOICE,
GEMINI_LIVE_MODELS,
GEMINI_DEFAULT_MODEL,
GEMINI_TRANSPORTS,
GEMINI_DEFAULT_TRANSPORT,
} from "@classytic/realtime-agents/gemini";
import type {
GeminiAdapterOptions,
GeminiVoiceOption,
GeminiVoiceId,
GeminiLiveModel,
GeminiLiveModelId,
GeminiTransport,
} from "@classytic/realtime-agents/gemini";
// Gemini also re-exports audio utilities
import {
base64ToUint8Array,
uint8ArrayToBase64,
createPcmBlob,
decodeAudioData,
getAudioWorkletUrl,
RECORDER_WORKLET_CODE,
} from "@classytic/realtime-agents/gemini";
MIT
FAQs
Provider-agnostic realtime voice agent orchestration for React (OpenAI Realtime, Gemini Live)
We found that @classytic/realtime-agents demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.