
Security News
ECMAScript 2025 Finalized with Iterator Helpers, Set Methods, RegExp.escape, and More
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
multi-voice-sdk
Advanced tools
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.
npm install multi-voice-sdk
import { tts, stt, merge } from "multi-voice-sdk";
// Generate speech with OpenAI
tts({
provider: "openai",
apiKey: "your-api-key",
text: "Hello, world!",
voice: "nova",
outputFile: "output.mp3",
});
// Transcribe audio with Deepgram
stt({
apiKey: "your-deepgram-key",
audioFile: "https://example.com/audio.wav", // Can be URL or local file
});
// Merge multiple audio files
merge({
inputFiles: ["file1.mp3", "file2.mp3"],
outputFile: "combined.mp3",
});
tts(options)
Generate speech from text using various TTS providers.
Parameter | Type | Required | Description |
---|---|---|---|
provider | string | ✅ | TTS provider: "gemini" , "deepgram" , "openai" , "groq" , or "cartesia" |
apiKey | string | ✅ | API key for the chosen provider |
text | string | ✅ | Text to convert to speech |
voice | string | ✅ | Voice identifier (provider-specific, for Cartesia use voice ID) |
outputFile | string | optional | Output file path (default: "output.mp3" ) |
model | string | optional | Model to use (provider-specific) |
prompt | string | optional | Additional instructions for speech generation |
OpenAI TTS
tts({
provider: "openai",
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4o-mini-tts",
text: "Hello from OpenAI!",
voice: "nova",
prompt: "Speak in a cheerful tone",
outputFile: "openai_output.mp3",
});
Google Gemini TTS
tts({
provider: "gemini",
apiKey: process.env.GEMINI_API_KEY,
text: "Hello from Gemini!",
voice: "iapetus",
prompt: "In a pleasant and calm tone",
outputFile: "gemini_output.mp3",
});
Deepgram TTS
tts({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
text: "Hello from Deepgram!",
voice: "aura-2-luna-en",
outputFile: "deepgram_output.mp3",
});
Groq PlayAI TTS
tts({
provider: "groq",
apiKey: process.env.GROQ_API_KEY,
text: "Hello from Groq PlayAI!",
voice: "Arista-PlayAI",
outputFile: "groq_output.wav",
});
Cartesia TTS
tts({
provider: "cartesia",
apiKey: process.env.CARTESIA_API_KEY,
text: "Hello from Cartesia!",
voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID
outputFile: "cartesia_output.mp3",
});
stt(options)
Transcribe audio to text using Speech-to-Text providers.
Parameter | Type | Required | Description |
---|---|---|---|
provider | string | ✅ | STT provider: "deepgram" or "assemblyai" |
apiKey | string | ✅ | API key for the chosen provider |
audioFile | string | ✅ | Path to local audio file or URL of remote audio file to transcribe |
outputFile | string | optional | Output file path for results (default: "transcription.json" ) |
model | string | optional | Model to use (default: "nova-3" ) |
smartFormat | boolean | optional | Enable smart formatting (default: true ) |
detect_language | boolean | optional | Automatic language detection (default: true ) |
punctuate | boolean | optional | Enable punctuation (default: true ) |
diarize | boolean | optional | Enable speaker diarization (default: false ) |
channels | number | optional | Number of audio channels (default: 1 ) |
fullResponse | boolean | optional | Return full response object instead of just transcript (default: false ) |
fullResponse: true
: Returns object with transcript, confidence, words, and metadataDeepgram : Basic Transcription (Remote URL)
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
});
Deepgram : Local File Transcription
stt({
provider: "deepgram",
apiKey: process.env.DEEPGRAM_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
});
AssemblyAI : Basic Transcription (Remote URL)
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "https://example.com/audio.wav", // Remote URL
outputFile: "transcription.json",
});
AssemblyAI : Local File Transcription
stt({
provider: "assemblyai",
apiKey: process.env.ASSEMBLYAI_API_KEY,
audioFile: "./my-audio.mp3", // Local file path
outputFile: "transcription.json",
fullResponse: true, // Get detailed response
});
merge(options)
Merge multiple audio files into a single file.
Parameter | Type | Required | Description |
---|---|---|---|
inputFiles | string[] | ✅ | Array of input file paths |
outputFile | string | ✅ | Output file path |
merge({
inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],
outputFile: "complete_audio.mp3",
});
alloy
, ash
, ballad
, coral
, echo
, fable
, onyx
, nova
, sage
, shimmer
, verse
zephyr
(Bright), puck
(Upbeat), charon
(Informative), kore
(Firm), fenrir
(Excitable), leda
(Youthful), orus
(Firm), aoede
(Breezy), autonoe
(Bright), enceladus
(Breathy), iapetus
(Clear)For a complete list of available Gemini voices, see: Gemini Speech Generation Documentation
aura-2-luna-en
, aura-2-stella-en
, aura-2-arcas-en
, and moreFor a complete list of available Deepgram voices, see: Deepgram TTS Models Documentation
Atlas-PlayAI
, Arista-PlayAI
, Basil-PlayAI
, Briggs-PlayAI
, and moreFor a complete list of available Groq PlayAI voices, see: Groq TTS Documentation
Cartesia uses voice IDs instead of voice names. Example voice IDs:
694f9389-aac1-45b6-b726-9d9369183238
(Default voice)For more information about Cartesia voices, see: Cartesia Console
Create a .env
file in your project root:
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
GROQ_API_KEY=your_groq_api_key
CARTESIA_API_KEY=your_cartesia_api_key
ISC
FAQs
A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities
The npm package multi-voice-sdk receives a total of 59 weekly downloads. As such, multi-voice-sdk popularity was classified as not popular.
We found that multi-voice-sdk demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.
Research
North Korean threat actors linked to the Contagious Interview campaign return with 35 new malicious npm packages using a stealthy multi-stage malware loader.