🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Book a Demo Install Sign in

multi-voice-sdk

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

multi-voice-sdk

A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK supporting multiple providers (OpenAI, Google Gemini, Deepgram, Groq PlayAI, Cartesia, AssemblyAI) with audio merging capabilities

1.1.0

latest

npm

Version published: last week

Weekly downloads: 59

Maintainers: 1

Weekly downloads

Created: 3 weeks ago

Source

Multi-Voice SDK

A universal Text-to-Speech (TTS) and Speech-to-Text (STT) SDK that supports multiple providers including Google Gemini, Deepgram, OpenAI, Groq PlayAI, Cartesia, and AssemblyAI. Easily generate audio content, transcribe speech, and manage audio files with a unified API.

Features

🎵 Multi-Provider TTS: Gemini, Deepgram, OpenAI, Groq PlayAI, and Cartesia TTS
🎙️ Speech-to-Text: Deepgram and AssemblyAI STT with advanced features
🔧 Audio Merging: Combine multiple audio files seamlessly
🎯 Simple API: Easy-to-use functions with consistent interface
📦 ESM Ready: Modern ES modules support

Installation

npm install multi-voice-sdk

Quick Start

import { tts, stt, merge } from "multi-voice-sdk";

// Generate speech with OpenAI
tts({
  provider: "openai",
  apiKey: "your-api-key",
  text: "Hello, world!",
  voice: "nova",
  outputFile: "output.mp3",
});

// Transcribe audio with Deepgram
stt({
  apiKey: "your-deepgram-key",
  audioFile: "https://example.com/audio.wav", // Can be URL or local file
});

// Merge multiple audio files
merge({
  inputFiles: ["file1.mp3", "file2.mp3"],
  outputFile: "combined.mp3",
});

API Reference

`tts(options)`

Generate speech from text using various TTS providers.

Parameters

Parameter	Type	Required	Description
`provider`	`string`	✅	TTS provider: `"gemini"`, `"deepgram"`, `"openai"`, `"groq"`, or `"cartesia"`
`apiKey`	`string`	✅	API key for the chosen provider
`text`	`string`	✅	Text to convert to speech
`voice`	`string`	✅	Voice identifier (provider-specific, for Cartesia use voice ID)
`outputFile`	`string`	optional	Output file path (default: `"output.mp3"`)
`model`	`string`	optional	Model to use (provider-specific)
`prompt`	`string`	optional	Additional instructions for speech generation

Examples

OpenAI TTS

tts({
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o-mini-tts",
  text: "Hello from OpenAI!",
  voice: "nova",
  prompt: "Speak in a cheerful tone",
  outputFile: "openai_output.mp3",
});

Google Gemini TTS

tts({
  provider: "gemini",
  apiKey: process.env.GEMINI_API_KEY,
  text: "Hello from Gemini!",
  voice: "iapetus",
  prompt: "In a pleasant and calm tone",
  outputFile: "gemini_output.mp3",
});

Deepgram TTS

tts({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  text: "Hello from Deepgram!",
  voice: "aura-2-luna-en",
  outputFile: "deepgram_output.mp3",
});

Groq PlayAI TTS

tts({
  provider: "groq",
  apiKey: process.env.GROQ_API_KEY,
  text: "Hello from Groq PlayAI!",
  voice: "Arista-PlayAI",
  outputFile: "groq_output.wav",
});

Cartesia TTS

tts({
  provider: "cartesia",
  apiKey: process.env.CARTESIA_API_KEY,
  text: "Hello from Cartesia!",
  voice: "694f9389-aac1-45b6-b726-9d9369183238", // Voice ID
  outputFile: "cartesia_output.mp3",
});

`stt(options)`

Transcribe audio to text using Speech-to-Text providers.

Parameters

Parameter	Type	Required	Description
`provider`	`string`	✅	STT provider: `"deepgram"` or `"assemblyai"`
`apiKey`	`string`	✅	API key for the chosen provider
`audioFile`	`string`	✅	Path to local audio file or URL of remote audio file to transcribe
`outputFile`	`string`	optional	Output file path for results (default: `"transcription.json"`)
`model`	`string`	optional	Model to use (default: `"nova-3"`)
`smartFormat`	`boolean`	optional	Enable smart formatting (default: `true`)
`detect_language`	`boolean`	optional	Automatic language detection (default: `true`)
`punctuate`	`boolean`	optional	Enable punctuation (default: `true`)
`diarize`	`boolean`	optional	Enable speaker diarization (default: `false`)
`channels`	`number`	optional	Number of audio channels (default: `1`)
`fullResponse`	`boolean`	optional	Return full response object instead of just transcript (default: `false`)

Returns

Default: Returns transcript as a string
With fullResponse: true: Returns object with transcript, confidence, words, and metadata

Examples

`Deepgram : Basic Transcription (Remote URL)`

stt({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  audioFile: "https://example.com/audio.wav", // Remote URL
});

`Deepgram : Local File Transcription`

stt({
  provider: "deepgram",
  apiKey: process.env.DEEPGRAM_API_KEY,
  audioFile: "./my-audio.mp3", // Local file path
  outputFile: "transcription.json",
});

`AssemblyAI : Basic Transcription (Remote URL)`

stt({
  provider: "assemblyai",
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  audioFile: "https://example.com/audio.wav", // Remote URL
  outputFile: "transcription.json",
});

`AssemblyAI : Local File Transcription`

stt({
  provider: "assemblyai",
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  audioFile: "./my-audio.mp3", // Local file path
  outputFile: "transcription.json",
  fullResponse: true, // Get detailed response
});

`merge(options)`

Merge multiple audio files into a single file.

Parameters

Parameter	Type	Required	Description
`inputFiles`	`string[]`	✅	Array of input file paths
`outputFile`	`string`	✅	Output file path

Example

merge({
  inputFiles: ["intro.mp3", "main.mp3", "outro.mp3"],
  outputFile: "complete_audio.mp3",
});

Supported Voices

OpenAI

alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

Gemini

zephyr (Bright), puck (Upbeat), charon (Informative), kore (Firm), fenrir (Excitable), leda (Youthful), orus (Firm), aoede (Breezy), autonoe (Bright), enceladus (Breathy), iapetus (Clear)

For a complete list of available Gemini voices, see: Gemini Speech Generation Documentation

Deepgram

aura-2-luna-en, aura-2-stella-en, aura-2-arcas-en, and more

For a complete list of available Deepgram voices, see: Deepgram TTS Models Documentation

Groq PlayAI

Atlas-PlayAI, Arista-PlayAI, Basil-PlayAI, Briggs-PlayAI, and more

For a complete list of available Groq PlayAI voices, see: Groq TTS Documentation

Cartesia

Cartesia uses voice IDs instead of voice names. Example voice IDs:

694f9389-aac1-45b6-b726-9d9369183238 (Default voice)
Use the Cartesia console to find available voice IDs for your account

For more information about Cartesia voices, see: Cartesia Console

Environment Variables

Create a .env file in your project root:

OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
GROQ_API_KEY=your_groq_api_key
CARTESIA_API_KEY=your_cartesia_api_key

Requirements

Node.js 16.x or higher

License

ISC

Keywords

FAQs

What is multi-voice-sdk?

Is multi-voice-sdk popular?

Is multi-voice-sdk well maintained?

Package last updated on 21 Jun 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

multi-voice-sdk

Multi-Voice SDK

Features

Installation

Quick Start

API Reference

tts(options)

Parameters

Examples

stt(options)

Parameters

Returns

Examples

Deepgram : Basic Transcription (Remote URL)

Deepgram : Local File Transcription

AssemblyAI : Basic Transcription (Remote URL)

AssemblyAI : Local File Transcription

merge(options)

Parameters

Example

Supported Voices

OpenAI

Gemini

Deepgram

Groq PlayAI

Cartesia

Environment Variables

Requirements

License

Keywords

Related posts

Node.js Homepage Adds Paid Support Link, Prompting Contributor Pushback

Another Wave: North Korean Contagious Interview Campaign Drops 35 New Malicious npm Packages

`tts(options)`

`stt(options)`

`Deepgram : Basic Transcription (Remote URL)`

`Deepgram : Local File Transcription`

`AssemblyAI : Basic Transcription (Remote URL)`

`AssemblyAI : Local File Transcription`

`merge(options)`