New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

chatdio

Package Overview
Dependencies
Maintainers
1
Versions
12
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

chatdio

Web audio library for conversational AI with mic input, device management, WebSocket streaming, and real-time visualization

latest
npmnpm
Version
1.2.4
Version published
Maintainers
1
Created
Source

chatdio

A modern Web Audio library for building conversational AI interfaces. Handles microphone capture, audio playback, device management, WebSocket streaming, and real-time visualization — all with cross-browser support (Chrome, Firefox, Safari).

Features

  • 🎙️ Microphone Capture with echo cancellation, noise suppression, and auto gain control
  • 🔊 Audio Playback with buffering, volume control, and seamless queuing
  • 📱 Device Management with hot-plug detection and automatic fallback
  • 🌐 WebSocket Streaming with auto-reconnection and binary/JSON modes
  • 📊 Real-time Visualization data for level meters and waveforms
  • 🎚️ Sample Rate & Bit Depth conversion (8/16/24/32-bit, 8kHz-48kHz)
  • 🔇 Barge-in Support for interrupting AI responses

Installation

npm install chatdio

Quick Start

import { Chatdio } from 'chatdio';

// Create instance with configuration
const audio = new Chatdio({
  microphone: {
    sampleRate: 16000,
    echoCancellation: true,
    noiseSuppression: true,
  },
  playback: {
    sampleRate: 24000,
    bitDepth: 16,
  },
  websocket: {
    url: 'wss://your-ai-server.com/audio',
    autoReconnect: true,
  },
});

// Initialize (must be called from a user gesture)
document.querySelector('#startBtn')?.addEventListener('click', async () => {
  await audio.initialize();
  
  // Start full-duplex conversation
  await audio.startConversation();
});

// Handle events
audio.on('mic:activity', (data) => {
  console.log('Mic level:', data.volume, 'Speaking:', data.isSpeaking);
});

audio.on('playback:activity', (data) => {
  console.log('Playback level:', data.volume);
});

audio.on('ws:connected', () => {
  console.log('Connected to AI server');
});

audio.on('ws:message', (message) => {
  console.log('Received message:', message);
});

Core Components

Chatdio

The main orchestrator that ties everything together.

const audio = new Chatdio({
  microphone: { /* MicrophoneConfig */ },
  playback: { /* PlaybackConfig */ },
  websocket: { /* WebSocketConfig */ },
  deviceManager: { /* DeviceManagerConfig */ },
  activityAnalyzer: { /* ActivityAnalyzerConfig */ },
});

// Lifecycle
await audio.initialize();      // Initialize (from user gesture)
await audio.startConversation(); // Start mic + websocket
audio.stopConversation();      // Stop mic + playback
audio.dispose();               // Cleanup resources

// Turn management (barge-in / interruption)
const turnId = audio.startTurn();           // Start new turn, interrupt any playing audio
audio.interruptTurn();                       // Interrupt current turn, start new one
audio.interruptTurn(false);                  // Interrupt without starting new turn
audio.getCurrentTurnId();                    // Get current turn ID
audio.clearTurnBuffer(turnId);               // Clear buffered audio for a turn
await audio.playAudioForTurn(data, turnId);  // Play only if turn is current

// Device selection
audio.getInputDevices();       // List microphones
audio.getOutputDevices();      // List speakers
await audio.setInputDevice(deviceId);
await audio.setOutputDevice(deviceId);

// Volume control
audio.setVolume(0.8);
audio.getVolume();

// Mute
audio.setMicrophoneMuted(true);
audio.isMicrophoneMuted();

MicrophoneCapture

Standalone microphone capture with resampling and format conversion.

import { MicrophoneCapture } from 'chatdio';

const mic = new MicrophoneCapture({
  sampleRate: 16000,          // Output sample rate
  echoCancellation: true,
  noiseSuppression: true,
  autoGainControl: true,
  bufferSize: 2048,           // Processing buffer size
});

mic.on('data', (pcmData: ArrayBuffer) => {
  // 16-bit PCM audio data ready to send
  websocket.send(pcmData);
});

mic.on('level', (level: number) => {
  updateMeter(level);
});

await mic.start();
// ...
mic.stop();

AudioPlayback

Buffered audio playback with queue management.

import { AudioPlayback } from 'chatdio';

const playback = new AudioPlayback({
  sampleRate: 24000,
  bitDepth: 16,
  channels: 1,
  bufferAhead: 0.1,  // Buffer ahead time in seconds
});

await playback.initialize();

// Queue audio chunks as they arrive
playback.on('buffer-low', () => {
  console.log('Buffer running low');
});

playback.on('ended', () => {
  console.log('Finished playing all audio');
});

// Queue PCM data
await playback.queueAudio(pcmArrayBuffer);

// Control playback
playback.pause();
await playback.resume();
playback.stop();
playback.setVolume(0.8);

AudioDeviceManager

Device enumeration with change detection.

import { AudioDeviceManager } from 'chatdio';

const deviceManager = new AudioDeviceManager({
  autoFallback: true,    // Auto-switch on device disconnect
  pollInterval: 1000,    // Fallback polling interval
});

await deviceManager.initialize();

// List devices
deviceManager.getInputDevices();
deviceManager.getOutputDevices();

// Select devices
await deviceManager.setInputDevice(deviceId);
await deviceManager.setOutputDevice(deviceId);

// Listen for changes
deviceManager.on('devices-changed', (devices) => {
  updateDeviceList(devices);
});

deviceManager.on('device-disconnected', (device) => {
  console.log('Device disconnected:', device.label);
});

// Check Safari compatibility
if (!deviceManager.isOutputSelectionSupported()) {
  console.log('Output selection not supported (Safari)');
}

WebSocketBridge

WebSocket connection with auto-reconnection.

import { WebSocketBridge } from 'chatdio';

const ws = new WebSocketBridge({
  url: 'wss://ai-server.com/audio',
  autoReconnect: true,
  maxReconnectAttempts: 5,
  reconnectDelay: 1000,
  binaryMode: true,
  
  // Custom message wrapping
  wrapOutgoingAudio: (data) => {
    return JSON.stringify({
      type: 'audio',
      data: btoa(String.fromCharCode(...new Uint8Array(data))),
    });
  },
  
  // Custom message parsing
  parseIncomingAudio: (event) => {
    const msg = JSON.parse(event.data);
    if (msg.type === 'audio') {
      return base64ToArrayBuffer(msg.data);
    }
    return null;
  },
});

ws.on('connected', () => console.log('Connected'));
ws.on('disconnected', (code, reason) => console.log('Disconnected:', reason));
ws.on('reconnecting', (attempt) => console.log('Reconnecting...', attempt));
ws.on('audio', (data) => playback.queueAudio(data));
ws.on('message', (msg) => console.log('Message:', msg));

await ws.connect();
ws.sendAudio(pcmData);
ws.sendMessage({ type: 'transcript', text: 'Hello' });
ws.disconnect();

ActivityAnalyzer

Real-time audio analysis for visualizations.

import { ActivityAnalyzer, VisualizationUtils } from 'chatdio';

const analyzer = new ActivityAnalyzer({
  fftSize: 256,
  smoothingTimeConstant: 0.8,
  updateInterval: 50,  // ms
});

// Connect to an audio node
analyzer.connect(micCapture.getAnalyzerNode());
analyzer.start();

// Listen for activity updates
analyzer.on('activity', (data) => {
  // data.volume - RMS volume (0-1)
  // data.peak - Peak level with decay (0-1)
  // data.frequencyData - Uint8Array for spectrum
  // data.timeDomainData - Uint8Array for waveform
  // data.isSpeaking - Voice activity detection
  
  drawWaveform(data.timeDomainData);
  drawSpectrum(data.frequencyData);
});

analyzer.on('speaking-start', () => console.log('Started speaking'));
analyzer.on('speaking-stop', () => console.log('Stopped speaking'));

// Utility functions for visualization
const bands = analyzer.getFrequencyBands(8);  // Get 8 frequency bands
const waveformPath = VisualizationUtils.createWaveformPath(data.timeDomainData, 200, 50);
const barHeights = VisualizationUtils.createBarHeights(data.frequencyData, 16, 100);

Events

Chatdio Events

EventPayloadDescription
mic:start-Microphone started
mic:stop-Microphone stopped
mic:dataArrayBufferPCM audio data
mic:activityAudioActivityDataMic visualization data
mic:errorErrorMicrophone error
playback:start-Playback started
playback:stop-Playback stopped
playback:ended-All queued audio finished
playback:activityAudioActivityDataPlayback visualization data
playback:errorErrorPlayback error
ws:connected-WebSocket connected
ws:disconnectedcode, reasonWebSocket disconnected
ws:reconnectingattemptReconnection attempt
ws:audioArrayBufferAudio received from server
ws:messageunknownNon-audio message received
ws:errorErrorWebSocket error
device:changedAudioDevice[]Device list changed
device:input-changedAudioDevice | nullInput device changed
device:output-changedAudioDevice | nullOutput device changed
device:disconnectedAudioDeviceDevice disconnected
turn:startedturnId, previousTurnIdNew turn started
turn:interruptedturnIdTurn was interrupted (barge-in)
turn:endedturnIdTurn ended normally

Turn Management (Barge-in)

Turn management allows you to handle conversation interruptions cleanly. When the user speaks while the AI is responding (barge-in), you can:

  • Stop current playback immediately
  • Clear any buffered audio
  • Ignore any late-arriving audio from the interrupted turn
// Start a conversation turn when AI begins responding
const turnId = audio.startTurn();
console.log('Started turn:', turnId);

// When user interrupts (detected via voice activity or button)
audio.on('mic:activity', (data) => {
  if (data.isSpeaking && audio.isPlaybackActive()) {
    // User is speaking while AI is talking - barge-in!
    const { interruptedTurnId, newTurnId } = audio.interruptTurn();
    console.log('Interrupted turn:', interruptedTurnId);
    console.log('New turn:', newTurnId);
  }
});

// Server sends audio with turn ID
audio.on('ws:message', async (message) => {
  if (message.type === 'audio') {
    // Only play if turn matches - old audio is automatically ignored
    const played = await audio.playAudioForTurn(message.data, message.turnId);
    if (!played) {
      console.log('Ignored audio from old turn:', message.turnId);
    }
  }
});

// Listen for turn events
audio.on('turn:started', (turnId, previousTurnId) => {
  console.log('Turn started:', turnId, 'Previous:', previousTurnId);
});

audio.on('turn:interrupted', (turnId) => {
  console.log('Turn interrupted:', turnId);
  // Notify server to stop generating audio for this turn
  audio.sendMessage({ type: 'interrupt', turnId });
});

audio.on('turn:ended', (turnId) => {
  console.log('Turn ended naturally:', turnId);
});

Server-Side Turn ID Support

When your server sends audio, include a turnId in JSON messages:

{
  "type": "audio",
  "data": "base64_encoded_audio...",
  "turnId": "turn_123456789_1"
}

Or use a custom parser to extract the turn ID:

const audio = new Chatdio({
  websocket: {
    url: 'wss://your-server.com/audio',
    parseIncomingAudio: (event) => {
      const msg = JSON.parse(event.data);
      if (msg.type === 'audio') {
        return {
          data: base64ToArrayBuffer(msg.audio),
          turnId: msg.turn_id,  // Your server's turn ID field
        };
      }
      return null;
    },
  },
});

Type Definitions

interface AudioFormat {
  sampleRate: 8000 | 16000 | 22050 | 24000 | 44100 | 48000;
  bitDepth: 8 | 16 | 24 | 32;
  channels: 1 | 2;
}

interface AudioDevice {
  deviceId: string;
  label: string;
  kind: 'audioinput' | 'audiooutput';
  isDefault: boolean;
}

interface AudioActivityData {
  volume: number;
  peak: number;
  frequencyData: Uint8Array;
  timeDomainData: Uint8Array;
  isSpeaking: boolean;
}

type ConnectionState = 'disconnected' | 'connecting' | 'connected' | 'reconnecting' | 'error';

Browser Compatibility

FeatureChromeFirefoxSafari
Mic Capture
Echo Cancellation
Audio Playback
Output Device Selection
Device Change DetectionVia polling

Notes

  • User Gesture Required: initialize() and startMicrophone() must be called from a user interaction (click, touch) in Safari and Firefox
  • Safari Output: Output device selection (setSinkId) is not supported in Safari; audio plays through the default device
  • Echo Cancellation: Browser implementations vary; Chrome generally has the best echo cancellation
  • Sample Rates: Native sample rate depends on the audio device; resampling is done in JavaScript when needed

iOS Compatibility

iOS Safari has strict requirements for audio playback. To ensure audio works on iPhone/iPad:

  • Call unlockAudio() from a user gesture (click/touch handler):
// IMPORTANT: Call this directly from a button click or touch event
startButton.addEventListener('click', async () => {
  await audio.initialize();
  await audio.unlockAudio();  // Unlocks iOS audio
  await audio.startConversation();
});
  • Why this is needed: iOS Safari requires audio to be "unlocked" by playing audio directly in response to a user gesture. The unlockAudio() method plays a tiny silent buffer which enables subsequent programmatic audio playback.

  • Common pitfall: If you initialize audio on page load or from a non-user-gesture context (like a setTimeout or Promise resolution), audio playback will fail silently on iOS.

  • The unlockAudio() method:

    • Resumes the AudioContext if suspended
    • Plays a silent buffer to unlock iOS audio
    • Starts the audio element if using output device selection
    • Should be called once per session, from a user gesture

License

MIT

Keywords

web-audio

FAQs

Package last updated on 15 Dec 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts