
Product
Introducing Supply Chain Attack Campaigns Tracking in the Socket Dashboard
Campaign-level threat intelligence in Socket now shows when active supply chain attacks affect your repositories and packages.
@juspay/vokal
Advanced tools
Production voice bot framework with TTS, STT, and AI evaluation using Neurolink
A production-ready voice bot testing and interaction framework with streaming Speech-to-Text, Text-to-Speech, and AI-powered evaluation
Vokal is a comprehensive TypeScript framework for building, testing, and evaluating voice-based applications. It provides a provider-agnostic architecture for Speech-to-Text, Text-to-Speech, and AI-powered evaluation services. Currently supports Google Cloud providers (Speech-to-Text, Text-to-Speech via Neurolink SDK, and Gemini AI), with an extensible design that allows for additional provider integrations.
Perfect for:
node -v # Should be 20.x or higher
pnpm -v # Should be 9.x or higher (npm or yarn also work)
pnpm add @juspay/vokal
Or clone and build from source:
git clone https://github.com/juspay/vokal.git
cd vokal
pnpm install
pnpm run build
Create a .env file in your project root:
# Option 1: Service Account (Recommended - Full Features)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Option 2: API Key (Limited Features)
GOOGLE_AI_API_KEY=your_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
π‘ Tip: Service account authentication provides access to advanced features like configurable VAD timeouts and enhanced STT capabilities.
import { createVoiceTest } from '@juspay/vokal';
const voiceTest = createVoiceTest();
// Generate and save speech
const audioPath = await voiceTest.generateSpeech({
text: "Welcome to Vokal! Your voice testing framework.",
languageCode: 'en-US',
voiceName: 'en-US-Neural2-F'
});
console.log('Audio saved to:', audioPath);
import { VoiceInteractionService } from '@juspay/vokal';
const voiceBot = new VoiceInteractionService();
// Run complete voice interaction
const result = await voiceBot.runVoiceInteraction(
"What is your name?",
{
language: 'en-US',
voice: 'en-US-Neural2-D',
backgroundSound: 'office',
backgroundVolume: 0.15
}
);
console.log('User said:', result.transcript);
console.log('Confidence:', result.confidence);
import { VoiceBotTestService } from '@juspay/vokal';
// Run test suite from configuration
const testService = VoiceBotTestService.create('./test-config.json');
const results = await testService.runTestSuite();
console.log(`β
Pass Rate: ${results.summary.passRate}%`);
console.log(`π Average Score: ${results.summary.averageScore}`);
console.log(`π Results: ${results.summary.resultsFile}`);
| Service | Description | Use Case |
|---|---|---|
| VoiceTestService | Text-to-Speech with background audio via Neurolink | Generate test audio with realistic environments |
| VoiceInteractionService | Complete TTS + Listen + STT pipeline | Full conversation simulation |
| VoiceBotTestService | Automated test suite execution | Test multiple scenarios with AI evaluation |
| AIComparisonService | AI-powered response evaluation | Semantic answer validation using Gemini |
| AudioMixerService | Background audio mixing | Add realistic noise to test scenarios |
| AudioRecordingService | Microphone recording via naudiodon | Capture user responses |
| STTHandlerManager | Provider-agnostic STT management | Unified interface for multiple STT providers |
Vokal includes a comprehensive command-line interface:
# Basic TTS generation
vokal voice generate "Hello, world!" --voice en-US-Neural2-F --lang en-US
# With background audio
vokal voice generate "Welcome" --voice en-US-Neural2-D --lang en-US --bg cafe --bgvol 0.2 --play
# Advanced settings
vokal voice generate "Fast speech" --voice en-US-Neural2-A --rate 1.5 --pitch 5.0 --output speech.mp3
# List all voices
vokal voices
# Filter by language
vokal voices en-US
# JSON output
vokal voices en-IN --format json
# List available background sounds
vokal backgrounds
# Test system audio capability
vokal test-audio
# Play an audio file
vokal play ./output.wav
# Create sample configuration
vokal test --save-sample
# Run test suite
vokal test ./config.json
# Run with specific provider and debug mode
vokal test --provider google-ai --debug --verbose
# Display comprehensive usage examples
vokal example
Run vokal --help for complete CLI documentation.
Create a JSON file to define your test scenarios:
{
"metadata": {
"name": "My Voice Bot Tests",
"version": "1.0.0",
"description": "Voice bot test suite"
},
"settings": {
"defaultLanguage": "en-US",
"defaultVoice": "en-US-Neural2-D",
"recordingDuration": 10000,
"passingScore": 0.7,
"sttProvider": "google-ai",
"ttsProvider": "google-ai",
"aiProvider": "google-ai",
"vadSettings": {
"silenceThreshold": 0.02,
"silenceDuration": 2000,
"speechTimeout": 10000
}
},
"questions": [
{
"id": "greeting",
"question": "Hello! How can I help you?",
"intent": "User greets and asks for help",
"expectedElements": ["Greeting", "Request for assistance"],
"sampleResponse": "Hi, I need help with my account"
}
]
}
See the examples/sample-config.json for a complete example.
Vokal is built with a provider-agnostic architecture using the Handler pattern for extensibility.
Google Cloud (Default)
GoogleAISTTHandlerAIComparisonServicevokal/
βββ src/
β βββ services/ # Core voice services
β β βββ voice-test.ts # TTS service with Neurolink
β β βββ voice-interaction.ts # Complete pipeline orchestration
β β βββ voice-bot-test.ts # Test suite execution
β β βββ ai-comparison.ts # AI-powered evaluation
β β βββ audio-mixer.ts # Background audio processing
β β βββ audio-recording.ts # Microphone capture
β βββ providers/ # Provider implementations
β β βββ google-ai-stt.handler.ts # Google Cloud STT
β β βββ stt-handler-manager.ts # Provider manager
β β βββ stt-registry.ts # Provider registry
β βββ types/ # TypeScript type definitions
β βββ utils/ # Utilities (logging, retry, validation, security)
β βββ constants/ # Audio configuration constants
β βββ errors/ # Custom error classes
β βββ cli/ # Command-line interface
βββ examples/ # Example configurations
β βββ sample-config.json # Test suite example
β βββ basic-example.js # Basic usage template
β βββ stt-handler-example.ts # STT provider example
βββ assets/ # Background audio files
β βββ office-ambience.wav
β βββ cafe-ambience.wav
β βββ nature-sounds.wav
β βββ rain-light.wav
β βββ phone-static.wav
β βββ crowd-distant.wav
βββ memory-bank/ # AI assistant context
βββ docs/ # Documentation (coming soon)
// Handler pattern for provider abstraction
interface STTHandler {
startStreaming(config, onResult, onSpeechStart, onSpeechEnd, onError);
stopStreaming();
}
// Register providers
STTHandlerManager.registerHandler('google-ai', GoogleAISTTHandler);
// Get provider instance
const handler = STTHandlerManager.getHandler('google-ai');
Available background sound presets for realistic test environments:
| Sound | Description | Recommended Volume | Use Case |
|---|---|---|---|
| office | Office ambience with typing and quiet chatter | 0.15 | Business applications, productivity bots |
| cafe | Coffee shop atmosphere with ambient noise | 0.20 | Customer service, casual conversations |
| nature | Outdoor setting with birds and gentle wind | 0.18 | Wellness apps, meditation guides |
| rain | Gentle rainfall ambience | 0.12 | Calming applications, sleep aids |
| phone | Phone line static and connection noise | 0.08 | IVR testing, call center simulations |
| crowd | Distant crowd noise and murmurs | 0.10 | Public space simulations, event apps |
All audio files are located in the assets/ directory as WAV files.
Vokal follows security best practices:
validation.tssecure-exec.ts# Build the project
pnpm run build
# Run linting
pnpm run lint
# Format code
pnpm run format
# Type checking
pnpm run typecheck
| Script | Description |
|---|---|
pnpm run build | Build TypeScript to JavaScript (dist/) |
pnpm run dev | Build in watch mode |
pnpm run clean | Clean build directory |
pnpm run lint | Lint code with ESLint |
pnpm run format | Format code with Prettier |
pnpm run typecheck | Run TypeScript type checking |
pnpm run prebuild | Format and lint before build |
Contributions are welcome! Please read our Contributing Guide for details.
git checkout -b feature/amazing-feature)git push origin feature/amazing-feature)See CODE_OF_CONDUCT.md for community guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
/docsMade with β€οΈ by the Breeze Team
FAQs
Production voice bot framework with TTS, STT, and AI evaluation using Neurolink
We found that @juspay/vokal demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 7 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Campaign-level threat intelligence in Socket now shows when active supply chain attacks affect your repositories and packages.

Research
Malicious PyPI package sympy-dev targets SymPy users, a Python symbolic math library with 85 million monthly downloads.

Security News
Node.js 25.4.0 makes require(esm) stable, formalizing CommonJS and ESM compatibility across supported Node versions.