🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more
Sign In

gemini-realtime-stream

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

gemini-realtime-stream

Google Gemini AI real-time streaming with audio processing capabilities

latest
npmnpm
Version
1.0.0
Version published
Maintainers
1
Created
Source

Gemini Real-time Stream MCP Server

A Model Context Protocol (MCP) server that provides real-time streaming capabilities with Google's Gemini AI models, including live audio/video processing, function calling, and bidirectional WebSocket communication.

Features

Core Capabilities

  • Real-time Streaming: Bidirectional WebSocket communication with Gemini models
  • Live Audio Processing: Real-time audio input/output with voice activity detection
  • Live Video Processing: Screen capture and video stream processing
  • Function Calling: Dynamic tool discovery and execution with JSON schema validation
  • Multimodal Support: Text, image, audio, and video input/output processing
  • Session Management: Persistent conversation contexts and state management

Available Tools

start_realtime_session

Initialize a real-time streaming session with Gemini Live API.

Parameters:

  • model (string, optional): Gemini model to use (default: "gemini-2.0-flash-exp")
  • voice (string, optional): Voice configuration for audio output
  • system_instruction (string, optional): System instructions for the model
  • tools (array, optional): Available tools for function calling

send_realtime_message

Send a message to an active real-time session.

Parameters:

  • session_id (string): Active session identifier
  • content (string): Message content to send
  • content_type (string, optional): Content type (default: "text")

stream_audio_input

Stream audio input to the real-time session.

Parameters:

  • session_id (string): Active session identifier
  • audio_data (string): Base64-encoded audio data
  • format (string, optional): Audio format (default: "pcm16")
  • sample_rate (number, optional): Sample rate in Hz (default: 16000)

capture_screen_stream

Capture and stream screen content to the session.

Parameters:

  • session_id (string): Active session identifier
  • region (object, optional): Screen region to capture
  • quality (string, optional): Capture quality ("high", "medium", "low")

get_session_status

Retrieve the current status of a real-time session.

Parameters:

  • session_id (string): Session identifier to check

end_realtime_session

Terminate an active real-time streaming session.

Parameters:

  • session_id (string): Session identifier to terminate

list_active_sessions

List all currently active real-time sessions.

Parameters: None

Installation

  • Install dependencies:
npm install
  • Build the TypeScript code:
npm run build
  • Configure your Gemini API key:
export GEMINI_API_KEY="your-api-key-here"

Configuration

Add the server to your MCP client configuration:

{
  "mcpServers": {
    "gemini-realtime-stream": {
      "command": "node",
      "args": ["/path/to/gemini-realtime-stream/dist/gemini-realtime-stream.js"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Usage Examples

Basic Real-time Chat

// Start a new session
const session = await startRealtimeSession({
  model: "gemini-2.0-flash-exp",
  system_instruction: "You are a helpful AI assistant."
});

// Send a message
await sendRealtimeMessage({
  session_id: session.id,
  content: "Hello, how are you today?"
});

Audio Streaming

// Start session with voice capabilities
const session = await startRealtimeSession({
  model: "gemini-2.0-flash-exp",
  voice: "Aoede"
});

// Stream audio input
await streamAudioInput({
  session_id: session.id,
  audio_data: base64AudioData,
  format: "pcm16",
  sample_rate: 16000
});

Screen Sharing

// Capture and stream screen content
await captureScreenStream({
  session_id: session.id,
  region: { x: 0, y: 0, width: 1920, height: 1080 },
  quality: "high"
});

API Reference

Session Management

  • Sessions are automatically managed with unique identifiers
  • Each session maintains its own conversation context
  • Sessions can be terminated manually or will timeout after inactivity

Audio Processing

  • Supports PCM16 audio format at various sample rates
  • Real-time voice activity detection
  • Bidirectional audio streaming (input and output)

Video Processing

  • Screen capture with configurable regions and quality
  • Real-time video stream processing
  • Support for multiple video formats

Function Calling

  • Dynamic tool discovery and registration
  • JSON schema validation for tool parameters
  • Parallel function execution support

Error Handling

The server provides comprehensive error handling:

  • Invalid session IDs return appropriate error messages
  • Network connectivity issues are handled gracefully
  • Audio/video processing errors are logged and reported

Security Considerations

  • API keys should be stored securely as environment variables
  • Screen capture requires appropriate system permissions
  • Audio input requires microphone access permissions

Dependencies

  • @modelcontextprotocol/sdk: MCP SDK for server implementation
  • @google/generative-ai: Google Generative AI SDK
  • ws: WebSocket library for real-time communication
  • Additional dependencies for audio/video processing

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please read the contributing guidelines before submitting pull requests.

Support

For issues and questions, please use the GitHub issue tracker.

Keywords

gemini

FAQs

Package last updated on 16 Aug 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts