New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

@zyphra/client

Package Overview
Dependencies
Maintainers
2
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@zyphra/client

A TypeScript client library for interacting with Zyphra's text-to-speech API.

latest
npmnpm
Version
1.0.5
Version published
Weekly downloads
15
-31.82%
Maintainers
2
Weekly downloads
 
Created
Source

Zyphra TypeScript Client

A TypeScript client library for interacting with Zyphra's text-to-speech API.

Installation

npm install @zyphra/client
# or
yarn add @zyphra/client

Quick Start

import { ZyphraClient } from '@zyphra/client';

// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });

// Generate speech
const audioBlob = await client.audio.speech.create({
  text: 'Hello, world!',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer' // Default model
});

// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);

Features

  • Text-to-speech generation with customizable parameters
  • Support for multiple languages and audio formats
  • Voice cloning capabilities
  • Multiple TTS models with specialized capabilities
  • TypeScript types included
  • Browser and Node.js support
  • Returns audio as Blob for easy handling
  • Support for default and custom voice selection

Parameters

The text-to-speech API accepts the following parameters:

interface TTSParams {
  text: string;                // The text to convert to speech (required)
  speaker_audio?: string;      // Base64 audio for voice cloning
  speaking_rate?: number;      // Speaking rate (5-35, default: 15.0)
  fmax?: number;               // Frequency max (0-24000, default: 22050)
  pitch_std?: number;          // Pitch standard deviation (0-500, default: 45.0) (transformer model only)
  emotion?: EmotionWeights;    // Emotional weights (transformer model only)
  language_iso_code?: string;  // Language code (e.g., "en-us", "fr-fr") 
  mime_type?: string;          // Output audio format (e.g., "audio/webm")
  model?: SupportedModel;      // TTS model (default: 'zonos-v0.1-transformer')
  speaker_noised?: boolean;    // Denoises to improve stability (hybrid model only, default: True)
  default_voice_name?: string; // Name of a default voice to use
  voice_name?: string;         // Name of one of the user's voices to use
}

// Available models
type SupportedModel = 'zonos-v0.1-transformer' | 'zonos-v0.1-hybrid';

interface EmotionWeights {
  happiness: number;  // default: 0.6
  sadness: number;    // default: 0.05
  disgust: number;    // default: 0.05
  fear: number;       // default: 0.05
  surprise: number;   // default: 0.05
  anger: number;      // default: 0.05
  other: number;      // default: 0.5
  neutral: number;    // default: 0.6
}

Detailed Usage

Supported TTS Models

The API supports the following TTS models:

  • zonos-v0.1-transformer (Default): A standard transformer-based TTS model suitable for most applications.
    • Supports pitch_std and emotions parameters
  • zonos-v0.1-hybrid: An advanced model with:
    • Better support for certain languages (especially Japanese)
    • Supports speaker_noised denoising parameter
    • Improved voice quality in some scenarios

Supported Languages

The text-to-speech API supports the following languages:

  • English (US) - en-us
  • French - fr-fr
  • German - de
  • Japanese - ja (recommended to use with zonos-v0.1-hybrid model)
  • Korean - ko
  • Mandarin Chinese - cmn

Supported Audio Formats

The API supports multiple output formats through the mime_type parameter:

  • WebM (default) - audio/webm
  • Ogg - audio/ogg
  • WAV - audio/wav
  • MP3 - audio/mp3 or audio/mpeg
  • MP4/AAC - audio/mp4 or audio/aac

Language and Format Examples

// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
  text: 'Bonjour le monde!',
  language_iso_code: 'fr-fr',
  mime_type: 'audio/mp3',
  speaking_rate: 15
});

// Generate Japanese speech with hybrid model (recommended)
const japaneseAudio = await client.audio.speech.create({
  text: 'こんにちは世界!',
  language_iso_code: 'ja',
  mime_type: 'audio/wav',
  speaking_rate: 15,
  model: 'zonos-v0.1-hybrid' // Better for Japanese
});

Using Default and Custom Voices

You can use pre-defined default voices or your own custom voices:

// Using a default voice
const defaultVoiceAudio = await client.audio.speech.create({
  text: 'This uses a default voice.',
  default_voice_name: 'american_female',
  speaking_rate: 15
});

Available Default Voices

The following default voices are available:

  • american_female - Standard American English female voice
  • american_male - Standard American English male voice
  • anime_girl - Stylized anime girl character voice
  • british_female - British English female voice
  • british_male - British English male voice
  • energetic_boy - Energetic young male voice
  • energetic_girl - Energetic young female voice
  • japanese_female - Japanese female voice
  • japanese_male - Japanese male voice

Using Custom Voices

You can use your own custom voices that have been created and stored in your account:

// Using a custom voice you've created and stored
const customVoiceAudio = await client.audio.speech.create({
  text: 'This uses your custom voice.',
  voice_name: 'my_custom_voice',
  speaking_rate: 15
});

Note: When using custom voices, the voice_name parameter should exactly match the name as it appears in your voices list on playground.zyphra.com/audio. The name is case-sensitive.

Model-Specific Parameters

For the hybrid model (zonos-v0.1-hybrid), you can utilize additional parameters:

// Using the hybrid model with its specific parameters
const hybridModelAudio = await client.audio.speech.create({
  text: 'This uses the hybrid model with special parameters.',
  model: 'zonos-v0.1-hybrid',
  speaker_noised: true,   // Denoises to improve stability
  speaking_rate: 15
});

Emotion Control

You can adjust the emotional tone of the speech:

const emotionalSpeech = await client.audio.speech.create({
  text: 'This is a happy message!',
  emotion: {
    happiness: 0.8,  // Increase happiness
    neutral: 0.3,    // Decrease neutrality
    sadness: 0.05,   // Keep other emotions at default values
    disgust: 0.05,
    fear: 0.05,
    surprise: 0.05,
    anger: 0.05,
    other: 0.5
  }
});

Voice Cloning

You can clone voices by providing a reference audio file as a base64 string:

// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');

const audioBlob = await client.audio.speech.create({
  text: 'This will use the cloned voice',
  speaker_audio: audio_base64,
  speaking_rate: 15
});

// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();

reader.onload = async () => {
  const base64 = reader.result.split(',')[1];
  
  const audioBlob = await client.audio.speech.create({
    text: 'This will use the cloned voice',
    speaker_audio: base64,
    speaking_rate: 15
  });
};

reader.readAsDataURL(file);

Streaming Support

For streaming audio directly:

const { stream, mimeType } = await client.audio.speech.createStream({
  text: 'This will be streamed to the client',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer'
});

// Use with audio element in browser
const audioElement = document.createElement('audio');
audioElement.src = URL.createObjectURL(new Blob([], { type: mimeType }));
audioElement.controls = true;

// Process the stream
const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Add each chunk to the audio element
  audioElement.src = URL.createObjectURL(
    new Blob([value], { type: mimeType })
  );
}

document.body.appendChild(audioElement);

Callback Options

You can also use callbacks to track progress during audio generation:

const audioBlob = await client.audio.speech.create(
  {
    text: 'Audio with progress tracking',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  },
  {
    onChunk: (chunk) => {
      console.log('Received chunk:', chunk.length, 'bytes');
    },
    onProgress: (totalBytes) => {
      console.log('Total bytes received:', totalBytes);
    },
    onComplete: (blob) => {
      console.log('Audio generation complete!', blob.size, 'bytes');
    }
  }
);

Error Handling

import { ZyphraError } from '@zyphra/client';

try {
  const audioBlob = await client.audio.speech.create({
    text: 'Hello, world!',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  });
} catch (error) {
  if (error instanceof ZyphraError) {
    console.error(`Error: ${error.statusCode} - ${error.response}`);
  }
}

Available Models

Speech Models

  • zonos-v0.1-transformer: Default transformer-based TTS model
  • zonos-v0.1-hybrid: Advanced hybrid TTS model with enhanced language support

License

MIT License

FAQs

Package last updated on 07 Apr 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts