🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Book a DemoInstallSign in
Socket

@omarimai/agents-plugin-google

Package Overview
Dependencies
Maintainers
1
Versions
55
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@omarimai/agents-plugin-google

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

1.1.13
latest
npm
Version published
Weekly downloads
2.9K
1848.32%
Maintainers
1
Weekly downloads
 
Created
Source

Google AI plugin for LiveKit Agents

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Installation

npm install @omarimai/agents-plugin-google

Usage

import { multimodal } from '@livekit/agents';
import * as google from '@omarimai/agents-plugin-google';

const model = new google.realtime.RealtimeModel({
  apiKey: process.env.GOOGLE_API_KEY,
  voice: 'Puck',
});

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

Configuration

Set your Google API key:

  • GOOGLE_API_KEY environment variable, or
  • Pass apiKey parameter to the constructor

For VertexAI, also set:

  • GOOGLE_CLOUD_PROJECT environment variable
  • GOOGLE_APPLICATION_CREDENTIALS pointing to your service account key

Step 7: Build and Test

7.1 Build the Project

pnpm build

7.2 Test the Integration

Create a simple test file to verify it works with MultimodalAgent:

// test.ts
import { multimodal, llm } from '@livekit/agents';
import * as google from './src/index.js';

const model = new google.realtime.RealtimeModel({
  apiKey: 'your-api-key',
  voice: 'Puck',
});

const fncCtx = new llm.FunctionContext();

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

console.log('Google plugin integrated successfully!');

Next Steps

  • Implement Google Live API Connection: Research Google's Live API documentation and implement the actual WebSocket connection
  • Add Authentication: Implement proper Google Cloud authentication
  • Complete Audio Processing: Finish the audio streaming implementation
  • Add Function Calling: Implement function calling support in the realtime session
  • Add Error Handling: Implement robust error handling and reconnection logic
  • Add Tests: Create comprehensive tests
  • Add LLM/STT/TTS: Complete the standard service implementations

Your plugin structure is now ready and should integrate seamlessly with the existing MultimodalAgent!

Google Gemini Live API TypeScript Plugin

A TypeScript implementation of the Google Gemini Live API for real-time audio conversations with advanced features including function calling, conversation management, and turn detection.

Features

  • Real-time audio streaming with Gemini Live API
  • Function calling and tool integration
  • Advanced conversation management with session.conversation.item.create()
  • Response generation control with session.response.create()
  • Server-side Voice Activity Detection (VAD) with adaptive thresholds
  • Multi-feature speech detection (audio level, energy, zero crossing rate)
  • Event-driven architecture with comprehensive event emission
  • Session management with recovery and error handling

Installation

npm install

Environment Setup

Set your Google API key:

export GOOGLE_API_KEY="your-api-key-here"

Basic Usage

import { RealtimeModel } from './src/realtime/realtime_model.js';

// Create a realtime model with advanced features
const model = new RealtimeModel({
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'You are a helpful AI assistant.',
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  }
});

// Create a session
const session = model.session({
  fncCtx: {},
  chatCtx: new ChatContext()
});

// Advanced conversation management
session.conversation.item.create({
  role: 'user',
  text: 'Hello, how are you?'
});

// Start response generation
session.response.create();

// Enhanced conversation management
const items = session.conversation.item.list();
console.log('Conversation items:', items);

// Update a conversation item
session.conversation.item.update('msg_1', {
  content: 'Updated message content'
});

// Delete a conversation item
session.conversation.item.delete('msg_1');

// Clear all conversation items
session.conversation.item.clear();

Advanced Turn Detection

The plugin includes sophisticated turn detection with multiple features:

const model = new RealtimeModel({
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,           // Audio level threshold
    silence_duration_ms: 1000, // Silence duration before turn end
    prefix_padding_ms: 200     // Padding before speech start
  }
});

// Listen for turn detection events
session.on('turn_detected', (event) => {
  console.log('Turn detected:', event);
  // event.type: 'silence_threshold'
  // event.duration: silence duration in ms
  // event.timestamp: when the turn was detected
});

session.on('input_speech_started', (event) => {
  console.log('Speech started:', event);
  // event.audioLevel: current audio level
  // event.energyLevel: current energy level
  // event.threshold: adaptive threshold used
});

Function Calling

Register and use tools with the session:

// Register a tool
session.updateTools([
  {
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string' }
      }
    },
    handler: async (args) => {
      const { location } = args;
      return { temperature: '72°F', condition: 'sunny' };
    }
  }
]);

// Listen for tool calls
session.on('toolCall', (toolCall) => {
  console.log('Tool called:', toolCall);
});

Event System

The plugin emits comprehensive events:

// Transcript events
session.on('transcript', (event) => {
  console.log('Transcript:', event.transcript, 'Final:', event.isFinal);
});

// Generation events
session.on('generation_created', (event) => {
  console.log('Generation started:', event.messageId);
});

// Error handling
session.on('error', (error) => {
  console.error('Session error:', error);
});

// Metrics
session.on('metrics_collected', (metrics) => {
  console.log('Usage metrics:', metrics);
});

Session Management

Advanced session control features:

// Interrupt current generation
session.interrupt();

// Start user activity
session.startUserActivity();

// Truncate conversation at specific message
session.truncate('msg_5', 5000); // Truncate at message 5, audio end at 5s

// Update session options
session.updateOptions({
  temperature: 0.7,
  maxOutputTokens: 1000
});

// Update instructions
session.updateInstructions('You are now a coding assistant.');

// Clear audio buffer
session.clearAudio();

// Commit audio for processing
session.commitAudio();

Audio Processing

Handle audio frames with automatic resampling:

// Push audio frames (automatically resampled)
session.pushAudio(audioFrame);

// Push video frames
session.pushVideo(videoFrame);

// Get current audio buffer
const audioBuffer = session.inputAudioBuffer;

Error Recovery

The plugin includes robust error recovery:

// Recover from text response
session.recoverFromTextResponse('item_123');

// Session automatically retries on connection failures
// Exponential backoff with configurable max retries

Configuration Options

const model = new RealtimeModel({
  // Model configuration
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'Custom instructions',
  
  // Generation parameters
  temperature: 0.8,
  maxOutputTokens: 1000,
  topP: 0.9,
  topK: 40,
  
  // Turn detection
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  },
  
  // Language and location
  language: 'en-US',
  location: 'us-central1',
  
  // VertexAI (optional)
  vertexai: false,
  project: process.env.GOOGLE_CLOUD_PROJECT
});

API Reference

RealtimeModel

  • session(options): Create a new session
  • close(): Close all sessions

RealtimeSession

Conversation Management

  • conversation.item.create(message): Create conversation item
  • conversation.item.update(id, updates): Update conversation item
  • conversation.item.delete(id): Delete conversation item
  • conversation.item.list(): List all conversation items
  • conversation.item.get(id): Get specific conversation item
  • conversation.item.clear(): Clear all conversation items

Response Management

  • response.create(): Start response generation

Audio Processing

  • pushAudio(frame): Push audio frame
  • pushVideo(frame): Push video frame
  • commitAudio(): Commit audio for processing
  • clearAudio(): Clear audio buffer

Session Control

  • interrupt(): Interrupt current generation
  • startUserActivity(): Start user activity
  • truncate(messageId, audioEndMs): Truncate conversation
  • updateOptions(options): Update session options
  • updateInstructions(instructions): Update instructions
  • updateTools(tools): Update available tools

Events

  • on(event, listener): Listen for events
  • off(event, listener): Remove event listener
  • emit(event, ...args): Emit event

Available events:

  • transcript: Text transcript updates
  • error: Error events
  • toolCall: Tool call events
  • generation_created: New generation started
  • input_audio_transcription_completed: Audio transcription completed
  • input_speech_started: Speech started
  • metrics_collected: Usage metrics
  • turn_detected: Turn detection events

License

Apache-2.0

FAQs

Package last updated on 23 Jun 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts