pi-voice
Hold-to-talk voice input for Pi.

What It Does
pi-voice adds hands-free voice input to the Pi coding agent CLI. Hold SPACE to record, release to transcribe — text appears in the editor in real time via Deepgram streaming STT.
Features
| Hold-to-talk | Hold SPACE to record, release to stop — text streams into the editor live |
| Streaming transcription | Deepgram Nova 3 WebSocket — interim results appear as you speak |
| Voice commands | "hey pi, run tests", "undo", "submit", "new line", "period" |
| Continuous dictation | /voice dictate for long-form speaking without holding keys |
| Double-escape clear | Press Escape twice to clear the editor |
| Cross-platform | macOS, Windows, Linux — Kitty protocol + non-Kitty fallback |
Install
pi install npm:@codexstar/pi-voice
Prerequisites
- SoX — microphone recording (
brew install sox / apt install sox / choco install sox)
- Deepgram API key — set
DEEPGRAM_API_KEY env var (get $200 free credit)
Quick Setup
brew install sox
export DEEPGRAM_API_KEY="your-key-here"
Then open Pi — the onboarding wizard handles the rest.
Usage
Voice Input
| Record to editor | Hold SPACE | Release to finalize transcription |
| Toggle recording | Ctrl+Shift+V | Works in all terminals |
| Clear editor | Escape × 2 | Double-tap within 500ms |
Commands
/voice
/voice on
/voice off
/voice setup
/voice test
/voice info
/voice dictate
/voice stop
/voice history
Voice Commands
Say these during recording — they're detected and executed automatically:
| "hey pi, run tests" | Inserts bun run test |
| "undo" / "undo that" | Removes last word |
| "clear" / "clear all" | Clears editor |
| "submit" / "send it" | Submits editor content |
| "new line" | Inserts \n |
| "period" / "comma" / "question mark" | Inserts punctuation |
How It Works
User holds SPACE
↓
SoX captures PCM audio from microphone
↓
Audio streams to Deepgram Nova 3 via WebSocket
↓
Interim transcripts update editor in real time
↓
User releases SPACE → CloseStream → final transcript
Hold Detection
Two terminal modes are supported:
Kitty protocol (Ghostty, Kitty, WezTerm, Windows Terminal 1.22+):
True key-down/repeat/release events. First press enters warmup immediately.
Non-Kitty (macOS Terminal, older terminals):
Gap-based detection. Counts rapid key-repeat events to distinguish hold from tap.
Both modes require holding for ≥800ms before recording activates. Quick taps type a normal space.
Architecture
extensions/voice.ts Main extension — recording, UI, state machine
extensions/voice/config.ts Config loading, saving, migration
extensions/voice/onboarding.ts First-run setup wizard
Configuration
Settings stored in Pi's settings files under the voice key:
| Global | ~/.pi/agent/settings.json |
| Project | <project>/.pi/settings.json |
{
"voice": {
"version": 2,
"enabled": true,
"language": "en",
"scope": "global",
"onboarding": {
"completed": true,
"schemaVersion": 2
}
}
}
Troubleshooting
/voice test
| "DEEPGRAM_API_KEY not set" | export DEEPGRAM_API_KEY="your-key" in ~/.zshrc |
| "SoX error" | brew install sox (macOS) or apt install sox (Linux) |
| Space doesn't activate | Check /voice info — voice may be disabled |
| Double space in editor | Increase typing cooldown or use Ctrl+Shift+V |
See docs/troubleshooting.md for more.
Security
- Cloud STT: Audio is sent to Deepgram for transcription. No local fallback.
- No telemetry: pi-voice does not collect or transmit usage data.
- API key: Stored in env var or Pi settings file — never logged or exposed in errors.
See SECURITY.md for vulnerability reporting.
License
MIT © 2026 codexstar69
Links