You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

fastrtc

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

fastrtc

The realtime communication library for Python

0.0.29

PyPI

Maintainers: 1

FastRTC

The Real-Time Communication Library for Python.

Turn any python function into a real-time audio and video stream over WebRTC or WebSockets.

Installation

pip install fastrtc

to use built-in pause detection (see ReplyOnPause), and text to speech (see Text To Speech), install the vad and tts extras:

pip install "fastrtc[vad, tts]"

Key Features

🗣️ Automatic Voice Detection and Turn Taking built-in, only worry about the logic for responding to the user.
💻 Automatic UI - Use the .ui.launch() method to launch the webRTC-enabled built-in Gradio UI.
🔌 Automatic WebRTC Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a webRTC endpoint for your own frontend!
⚡️ Websocket Support - Use the .mount(app) method to mount the stream on a FastAPI app and get a websocket endpoint for your own frontend!
📞 Automatic Telephone Support - Use the fastphone() method of the stream to launch the application and get a free temporary phone number!
🤖 Completely customizable backend - A Stream can easily be mounted on a FastAPI app so you can easily extend it to fit your production application. See the Talk To Claude demo for an example of how to serve a custom JS frontend.

Docs

https://fastrtc.org

Examples

See the Cookbook for examples of how to use the library.

🗣️👀 Gemini Audio Video Chat

Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

Demo | Code

🗣️ Google Gemini Real Time Voice API

Talk to Gemini in real time using Google's voice API.

Demo | Code

🗣️ OpenAI Real Time Voice API

Talk to ChatGPT in real time using OpenAI's voice API.

Demo | Code

🤖 Hello Computer

Say computer before asking your question!

Demo | Code

🤖 Llama Code Editor

Create and edit HTML pages with just your voice! Powered by SambaNova systems.

Demo | Code

🗣️ Talk to Claude

Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

Demo | Code

🎵 Whisper Transcription

Have whisper transcribe your speech in real time!

Demo | Code

📷 Yolov10 Object Detection

Run the Yolov10 model on a user webcam stream in real time!

Demo | Code

🗣️ Kyutai Moshi

Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

Demo | Code

🗣️ Hello Llama: Stop Word Detection

A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

Demo | Code

Usage

This is a shortened version of the official usage guide.

.ui.launch(): Launch a built-in UI for easily testing and sharing your stream. Built with Gradio.
.fastphone(): Get a free temporary phone number to call into your stream. Hugging Face token required.
.mount(app): Mount the stream on a FastAPI app. Perfect for integrating with your already existing production system.

Quickstart

Echo Audio

from fastrtc import Stream, ReplyOnPause
import numpy as np

def echo(audio: tuple[int, np.ndarray]):
    # The function will be passed the audio until the user pauses
    # Implement any iterator that yields audio
    # See "LLM Voice Chat" for a more complete example
    yield audio

stream = Stream(
    handler=ReplyOnPause(echo),
    modality="audio", 
    mode="send-receive",
)

LLM Voice Chat

from fastrtc import (
    ReplyOnPause, AdditionalOutputs, Stream,
    audio_to_bytes, aggregate_bytes_to_16bit
)
import gradio as gr
from groq import Groq
import anthropic
from elevenlabs import ElevenLabs

groq_client = Groq()
claude_client = anthropic.Anthropic()
tts_client = ElevenLabs()


# See "Talk to Claude" in Cookbook for an example of how to keep 
# track of the chat history.
def response(
    audio: tuple[int, np.ndarray],
):
    prompt = groq_client.audio.transcriptions.create(
        file=("audio-file.mp3", audio_to_bytes(audio)),
        model="whisper-large-v3-turbo",
        response_format="verbose_json",
    ).text
    response = claude_client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    response_text = " ".join(
        block.text
        for block in response.content
        if getattr(block, "type", None) == "text"
    )
    iterator = tts_client.text_to_speech.convert_as_stream(
        text=response_text,
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_multilingual_v2",
        output_format="pcm_24000"
        
    )
    for chunk in aggregate_bytes_to_16bit(iterator):
        audio_array = np.frombuffer(chunk, dtype=np.int16).reshape(1, -1)
        yield (24000, audio_array)

stream = Stream(
    modality="audio",
    mode="send-receive",
    handler=ReplyOnPause(response),
)

Webcam Stream

from fastrtc import Stream
import numpy as np


def flip_vertically(image):
    return np.flip(image, axis=0)


stream = Stream(
    handler=flip_vertically,
    modality="video",
    mode="send-receive",
)

Object Detection

from fastrtc import Stream
import gradio as gr
import cv2
from huggingface_hub import hf_hub_download
from .inference import YOLOv10

model_file = hf_hub_download(
    repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
)

# git clone https://huggingface.co/spaces/fastrtc/object-detection
# for YOLOv10 implementation
model = YOLOv10(model_file)

def detection(image, conf_threshold=0.3):
    image = cv2.resize(image, (model.input_width, model.input_height))
    new_image = model.detect_objects(image, conf_threshold)
    return cv2.resize(new_image, (500, 500))

stream = Stream(
    handler=detection,
    modality="video", 
    mode="send-receive",
    additional_inputs=[
        gr.Slider(minimum=0, maximum=1, step=0.01, value=0.3)
    ]
)

Running the Stream

Run:

Gradio

stream.ui.launch()

Telephone (Audio Only)

```py
stream.fastphone()
```

FastAPI

app = FastAPI()
stream.mount(app)

# Optional: Add routes
@app.get("/")
async def _():
    return HTMLResponse(content=open("index.html").read())

# uvicorn app:app --host 0.0.0.0 --port 8000

Keywords

audio

audio processing

computer vision

gradio-custom-component

image

machine learning

FAQs

What is fastrtc?

Is fastrtc well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

fastrtc

FastRTC

The Real-Time Communication Library for Python.

Installation

Key Features

Docs

Examples

🗣️👀 Gemini Audio Video Chat

🗣️ Google Gemini Real Time Voice API

🗣️ OpenAI Real Time Voice API

🤖 Hello Computer

🤖 Llama Code Editor

🗣️ Talk to Claude

🎵 Whisper Transcription

📷 Yolov10 Object Detection

🗣️ Kyutai Moshi

🗣️ Hello Llama: Stop Word Detection

Usage

Quickstart

Echo Audio

LLM Voice Chat

Webcam Stream

Object Detection

Running the Stream

Gradio

Telephone (Audio Only)

FastAPI

Keywords

Related posts

Introducing License Overlays: Smarter License Management for Real-World Code

Introducing Rust Support in Socket