Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@codexstar/pi-listen

Package Overview
Dependencies
Maintainers
1
Versions
62
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@codexstar/pi-listen

Voice in + voice out for Pi CLI — hold-to-talk STT (Deepgram or 19 offline models) plus TTS (Kitten Nano, Piper, Kokoro, or Deepgram Aura)

latest
Source
npmnpm
Version
7.2.2
Version published
Maintainers
1
Created
Source

English | 简体中文 | 日本語 | 한국어 | Español | Français | Português | हिन्दी

pi-listen

pi-listen — Voice input for the Pi coding agent

Hold-to-talk voice input for Pi. Cloud streaming via Deepgram or fully offline with local models.

npm version license author

v7.0.0 — World-class TTS UX — pick models from /voice-settings Speak tab (no more JSON editing), auto-download on selection with progress, voice picker for every backend, first-run onboarding with smart-default recommendation by your system locale, and ttsAutoSpeak: true finally works — auto-speaks the agent's responses with code-block stripping and rate limiting. Diagnostic command /voice-speak-info shows everything. Resume-on-interrupt downloads. Plus all v6 features (14 local models from 25 MB Kitten Nano up, Deepgram Aura cloud, region-strict language matching, sentence-aware chunking). Full changelog →

See How It Works

Watch demo video
Click to watch the demo video

Setup (2 minutes)

1. Install the extension

# In a regular terminal (not inside Pi)
pi install npm:@codexstar/pi-listen

2. Choose your backend

pi-listen supports two transcription backends:

Deepgram (cloud)Local models (offline)
How it worksLive streaming — text appears as you speakBatch mode — transcribes after you finish recording
SetupAPI key requiredNo API key, models auto-download on first use
InternetRequiredNot required after model download
LatencyReal-time interim results2–10 seconds after recording stops
Languages56+ with live streamingDepends on model (1–57 languages)
Cost$200 free credit (lasts 6–12 months for most developers)Free forever

Run /voice-settings inside Pi to choose your backend and configure everything from one panel.

Sign up at dpgr.am/pi-voice — $200 free credit, no card needed.

export DEEPGRAM_API_KEY="your-key-here"    # add to ~/.zshrc or ~/.bashrc

Option B: Local models (fully offline)

No setup needed — run /voice-settings, switch backend to Local, and select a model. It downloads automatically.

Note: Local models use batch mode — they transcribe after you finish recording, not while you speak. For live streaming as you speak, use Deepgram.

3. Open Pi

On first launch, pi-listen checks your setup and tells you what's ready:

  • Backend configured (Deepgram key or local model)
  • Audio capture tool detected (sox, ffmpeg, or arecord)
  • If everything checks out, voice activates immediately

Audio capture

pi-listen auto-detects your audio tool. No manual install needed if you already have sox or ffmpeg.

PriorityToolPlatformsInstall
1SoX (rec)macOS, Linux, Windowsbrew install sox / apt install sox / choco install sox
2ffmpegmacOS, Linux, Windowsbrew install ffmpeg / apt install ffmpeg
3arecordLinux onlyPre-installed (ALSA)

Settings Panel

All configuration lives in one place: /voice-settings. Four tabs cover everything you need.

General — backend, language, scope

General settings — backend, model, language, scope, voice toggle

Toggle between Deepgram (cloud, live streaming) and Local (offline, batch mode). Change language, scope, and enable/disable voice — all with keyboard shortcuts.

Models — browse, search, install

Models tab — browse 19 models with accuracy/speed ratings

Browse 19 models from Parakeet, Whisper, Moonshine, SenseVoice, and GigaAM. Each model shows accuracy and speed ratings (●●●●○/●●●●○), fitness badges, and download status. Fuzzy search to find models fast. Press Enter to activate and download.

Downloaded — manage installed models

Downloaded tab — manage installed models, activate or delete

See what's installed, total disk usage, and which model is active. Press Enter to activate, x to delete. Models from Handy are auto-detected and can be imported without re-downloading.

Device — hardware profile and dependencies

Device tab — hardware profile, dependencies, disk space

See your hardware profile (RAM, CPU, GPU), dependency status (sherpa-onnx runtime), available disk space, and total downloaded models. Model recommendations are based on this profile.

Usage

Keybindings

ActionKeyNotes
Record to editorHold SPACE (≥1.2s)Release to finalize. Pre-records during warmup so you don't miss words.
Toggle recordingCtrl+Shift+VWorks in all terminals — press to start, press again to stop.
Clear editorEscape × 2Double-tap within 500ms to clear all text.

How recording works

  • Hold SPACE — warmup countdown appears, audio capture starts immediately (pre-recording)
  • Keep holding — live transcription streams into the editor (Deepgram) or audio buffers (local)
  • Release SPACE — recording continues for 1.5s (tail recording) to catch your last word, then finalizes
  • Text appears in the editor, ready to send

Commands

CommandDescription
/voice-settingsSettings panel — backend, models, language, scope, device
/voice-modelsSettings panel (Models tab)
/voice-speak <text>Speak text out loud (TTS)
/voice-speak-testSpeak a sample sentence
/voice-speak-toggleEnable / disable TTS
/voice-autosubmit `[onoff]`
/voice-speak-modelsBrowse / install TTS voice models
/voice-speak-infoDiagnose TTS state
/voice-helpKeyboard + command reference (or press F1)
/voice testFull diagnostics — audio tool, mic, API key
/voice on / offEnable or disable voice
/voice dictateContinuous dictation (no key hold)
/voice stopStop active recording or dictation
/voice historyRecent transcriptions
/voiceToggle on/off

v7.1 keyboard

While in the settings panel:

KeyAction
← →switch tab
↑ ↓navigate row (skips group headings)
select / activate
escback to main / close panel
typefilter (search)
bkspclear last search char

While an install widget or playback indicator is mounted (no overlay in front):

KeyAction
esccancel active install (most-recent first), then stop playback
F1open help overlay (always available)

Local Models

19 models across 5 families. Sorted by quality — best models first.

Top picks

ModelAccuracySpeedSizeLanguagesNotes
Parakeet TDT v3●●●●○●●●●○671 MB25 (auto-detect)Best overall. WER 6.3%.
Parakeet TDT v2●●●●●●●●●○661 MBEnglishBest English. WER 6.0%.
Whisper Turbo●●●●○●●○○○1.0 GB57Broadest language support.

Fast and lightweight

ModelAccuracySpeedSizeLanguagesNotes
Moonshine v2 Tiny●●○○○●●●●●43 MBEnglish34ms latency. Raspberry Pi friendly.
Moonshine Base●●●○○●●●●●287 MBEnglishHandles accents well.
SenseVoice Small●●●○○●●●●●228 MBzh/en/ja/ko/yueBest for CJK languages.

Specialist

ModelAccuracySpeedSizeLanguagesNotes
GigaAM v3●●●●○●●●●○225 MBRussian50% lower WER than Whisper on Russian.
Whisper Medium●●●●○●●●○○946 MB57Good accuracy, medium speed.
Whisper Large v3●●●●○●○○○○1.8 GB57Highest Whisper accuracy. Slow on CPU.

Plus 8 language-specialized Moonshine v2 variants for Japanese, Korean, Arabic, Chinese, Ukrainian, Vietnamese, and Spanish.

How local models work

Hold SPACE → audio captured to memory buffer
                ↓
Release SPACE → buffer sent to sherpa-onnx (in-process)
                ↓
         ONNX inference on CPU (2–10 seconds)
                ↓
         Final transcript inserted into editor

Models download automatically on first use. Downloads are resumable, verified after completion, and deduplicated (no double-downloads). The settings panel shows real-time download progress with speed and ETA.

Models from Handy (~/Library/Application Support/com.pais.handy/models/) are auto-detected and can be imported via symlink (zero disk duplication).

Features

FeatureDescription
Dual backendDeepgram (cloud, live streaming) or local models (offline, batch) — switch in settings
19 local modelsParakeet, Whisper, Moonshine, SenseVoice, GigaAM — with accuracy/speed ratings
Unified settings panelOne overlay panel for all configuration — /voice-settings
Device-aware recommendationsScores models against your hardware. Only best-in-class models get [recommended].
Enterprise download pipelinePre-checks (disk, network, permissions), live progress with speed/ETA, post-verification
Handy integrationAuto-detects models from Handy app, imports via symlink
Audio fallback chainTries sox, ffmpeg, arecord in order
Pre-recordingAudio capture starts during warmup — you never miss the first word
Tail recordingKeeps recording 1.5s after release so your last word isn't clipped
Live streamingDeepgram Nova 3 WebSocket — interim transcripts as you speak
56+ languagesDeepgram: 56+ with live streaming. Local: up to 57 depending on model.
Continuous dictation/voice dictate for long-form input without holding keys
Typing cooldownSpace holds within 400ms of typing are ignored
Sound feedbackmacOS system sounds for start, stop, and error events
Cross-platformmacOS, Windows, Linux — Kitty protocol + non-Kitty fallback

Architecture

extensions/voice.ts                Main extension — state machine, recording, UI, settings panel
extensions/voice/config.ts         Config loading, saving, migration
extensions/voice/onboarding.ts     First-run wizard, language picker
extensions/voice/deepgram.ts       Deepgram URL builder, API key resolver
extensions/voice/local.ts          Model catalog (19 models), in-process transcription
extensions/voice/device.ts         Device profiling — RAM, GPU, CPU, container detection
extensions/voice/model-download.ts Download manager — resume, progress, verification, Handy import
extensions/voice/sherpa-engine.ts   sherpa-onnx bindings — recognizer lifecycle, inference
extensions/voice/settings-panel.ts  Settings panel — Component interface, overlay, 4 tabs

Configuration

Settings stored in Pi's settings files under the voice key:

ScopePath
Global~/.pi/agent/settings.json
Project<project>/.pi/settings.json
{
  "voice": {
    "version": 2,
    "enabled": true,
    "language": "en",
    "backend": "local",
    "localModel": "parakeet-v3",
    "scope": "global",
    "onboarding": { "completed": true, "schemaVersion": 2 }
  }
}

DEEPGRAM_API_KEY from your shell is used at runtime and is not copied back into ~/.pi/agent/settings.json. If you paste a key during onboarding, that is an explicit save and it still goes to ~/.env.secrets or ~/.zshrc.

Troubleshooting

Run /voice test inside Pi for full diagnostics.

ProblemSolution
"DEEPGRAM_API_KEY not set"Get a keyexport DEEPGRAM_API_KEY="..." in ~/.zshrc
"No audio capture tool found"brew install sox or brew install ffmpeg
Space doesn't activate voiceRun /voice-settings — voice may be disabled
Local model not transcribingCheck /voice-settings → Device tab for sherpa-onnx status
Download failedPartial downloads auto-resume on retry. Check disk space in Device tab.
dyld: Library not loaded: libsimdjson on macOSHomebrew Node ABI mismatch — run brew reinstall node or switch to version-managed Node (mise, fnm, nvm)

Security

  • Cloud STT — audio is sent to Deepgram for transcription (Deepgram backend only)
  • Local STT — audio never leaves your machine (local backend)
  • No telemetry — pi-listen does not collect or transmit usage data
  • API key — stored in env var or Pi settings, never logged

See SECURITY.md for vulnerability reporting.

License

MIT © 2026 @baanditeagle

Made by @baanditeagle

Website · 𝕏 Twitter · GitHub · npm · Report a Bug · Pi CLI

Keywords

pi-package

FAQs

Package last updated on 01 May 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts