Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
# Install the module
pip install pipecat-ai
# Set up your environment
cp dot-env.template .env
To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:
pip install "pipecat-ai[option,...]"
Available options include:
Category | Services | Install Command Example |
---|---|---|
Speech-to-Text | AssemblyAI, Azure, Deepgram, Gladia, Whisper | pip install "pipecat-ai[deepgram]" |
LLMs | Anthropic, Azure, Fireworks AI, Gemini, Ollama, OpenAI, Together AI | pip install "pipecat-ai[openai]" |
Text-to-Speech | AWS, Azure, Cartesia, Deepgram, ElevenLabs, Google, LMNT, OpenAI, PlayHT, Rime, XTTS | pip install "pipecat-ai[cartesia]" |
Speech-to-Speech | OpenAI Realtime | pip install "pipecat-ai[openai]" |
Transport | Daily (WebRTC), WebSocket, Local | pip install "pipecat-ai[daily]" |
Video | Tavus | pip install "pipecat-ai[tavus]" |
Vision & Image | Moondream, fal | pip install "pipecat-ai[moondream]" |
Audio Processing | Silero VAD, Krisp, Noisereduce | pip install "pipecat-ai[silero]" |
Analytics & Metrics | Canonical AI, Sentry | pip install "pipecat-ai[canonical]" |
📚 View full services documentation →
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use Daily for real-time media transport, and Cartesia for text-to-speech.
import asyncio
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token="", # leave empty. Note: token is _not_ your api key
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
# Use Cartesia for Text-to-Speech
tts = CartesiaTTSService(
api_key=...,
voice_id=...
)
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
# Queue a TextFrame that will get spoken by the TTS service (Cartesia)
await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
# Run the pipeline task
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
Run it with:
python app.py
Daily provides a prebuilt WebRTC user interface. While the app is running, you can visit at https://<yourdomain>.daily.co/<room_url>
and listen to the bot say hello!
WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see this post.)
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up here and create a room in the developer Dashboard.
Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:
python3 -m venv venv
source venv/bin/activate
From the root of this repo, run the following:
pip install -r dev-requirements.txt
python -m build
This builds the package. To use the package locally (e.g. to run sample files), run
pip install --editable ".[option,...]"
If you want to use this package from another directory, you can run:
pip install "path_to_this_repo[option,...]"
From the root directory, run:
pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
This project uses strict PEP 8 formatting via Ruff.
You can use use-package to install emacs-lazy-ruff package and configure ruff
arguments:
(use-package lazy-ruff
:ensure t
:hook ((python-mode . lazy-ruff-mode))
:config
(setq lazy-ruff-format-command "ruff format")
(setq lazy-ruff-only-format-block t)
(setq lazy-ruff-only-format-region t)
(setq lazy-ruff-only-format-buffer t))
ruff
was installed in the venv
environment described before, so you should be able to use pyvenv-auto to automatically load that environment inside Emacs.
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
Install the
Ruff extension. Then edit the user settings (Ctrl-Shift-P Open User Settings (JSON)
) and set it as the default Python formatter, and enable formatting on save:
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
}
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:
Before submitting a pull request, please check existing issues and PRs to avoid duplicates.
We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.
FAQs
An open source framework for voice (and multimodal) assistants
We found that pipecat-ai demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.