Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

route-tts

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

route-tts

A flexible routing library for GenAI text-to-speech (TTS).

0.3.1
PyPI

Sorry we don't scan binary artifacts yet

Maintainers: 1

RouteTTS

RouteTTS is a flexible routing library for multiple GenAI text-to-speech (TTS) providers. It provides a unified interface to generate audio from text blocks and makes it easy to combine multiple TTS providers into a single audio file.

Supported TTS Platforms:

Please open an issue to suggest more!

Features

Unified interface for multiple TTS providers
Easy configuration of multiple voices and speech generation
Audio normalization (prevents model output volumes from being noticably different)

Planned features:

Automatic chunking to overcome input character limits.
Speech generation optimizations.

Installation

To install Route TTS, you need to have Poetry installed. If you don't have Poetry, you can install it by following the instructions here.

Once you have Poetry installed, clone this repository and install the dependencies:

poetry install

To include RouteTTS as a dependency, you just install it normally via pip.

pip install route-tts

Usage

RouteTTS provides an extremely simple wrapper over the most common TTS model providers such as OpenAI and ElevenLabs (others coming soon).

You first initialize a TTS client with a list of Voice objects. Each Voice object contains information about the voice's platform, voice_model, and a unique voice identifier. Then, to generate audio, you create a SpeechBlock with a id and the text to convert to audio. That's it.

Now, you can just easily change the id and we'll handle the rest.

API Keys for Speech Providers

To use RouteTTS in your project, you'll need to set up your API keys for the TTS providers you want to use.

Before running the application, you need to set up the following environment variables:

export OPENAI_API_KEY=your_openai_api_key_here
export ELEVEN_API_KEY=your_elevenlabs_api_key_here

You can set these environment variables in your shell or add them to a .env file in the root directory of the project. Alternatively, you can pass the API keys directly when initializing the TTS client.

Creating Voices

Create voices each with a unique identifiers. Here are examples for OpenAI and ElevenLabs voices:

OpenAI

As of August 30th, 2024, OpenAI has four voices: alloy, echo, fable, onyx, nova, and shimmer. They also have two voice_model: tts-1 and tts-1-hd.

OpenAIVoice(
    id=<any_unique_id>
    voice=<alloy | echo | fable | onyx | nova | shimmer>
    voice_model: <tts-1 | tts-1-hd>
)

ElevenLabs

Refer to the ElevenLabs documentation to find your voice and associated voice_model and id.

ElevenLabsVoice(
    id=<any_unique_id>
    voice=<eleven labs voice id>
    voice_model: <eleven_multilingual_v1 | eleven_turbo_v2 | eleven_turbo_v2_5> // Others may have been released
)

TTS Instantiation

Initialize a TTS object with the voices you just created.

TTS(
    voices=[openai_voice, elevenlabs_voice],
)

Creating Audio

Now, you can generate audio by creating a SpeechBlock object and calling TTS().generate_audio()

Single Audio SpeechBlock

# Create SpeechBlock object
speech_block = SpeechBlock(
    voice_id=<voice_id>,
    text="Some random text to convert to audio"
)

# Generate Audio
audio = TTS().generate_speech(speech_block)

# Save Audio file as .mp3
audio_file_path = "output_audio.mp3"
with open(audio_file_path, "wb") as audio_file:
    audio_file.write(audio)

Multiple SpeechBlock

We (will soon) handle optimization of converting multiple SpeechBlocks in a List. Certain providers (OpenAI) do not provide a way to maintain context and intonation across multiple requests which becomes embarassingly parallel. Other platforms like ElevenLabs does enable this so that a TTS request can know how the previous one ended, creating more natural sounding realism.

# Create SpeechBlock objects
speech_block_one = SpeechBlock(
    voice_id=<voice_id_one>,
    text="Some random text to convert to audio"
)

speech_block_two = SpeechBlock(
    voice_id=<voice_id_two>,
    text="Some more random text to convert to audio"
)

# Generate Audio
audio = TTS().generate_speech_list([speech_block_one, speech_block_two])

# Save Audio file as .mp3
audio_file_path = "output_audio.mp3"
with open(audio_file_path, "wb") as audio_file:
    audio_file.write(audio)

Tests

You can run the test suite by:

poetry run pytest

Feature List

Add Deepgram audio provider
Add Play.ht audio provider
Add AWS Polly audio provider
Enable multi-speaker conversation by passing a List of SpeechBlocks
Generate all OpenAI SpeechBlocks in parrallel because there's no context awareness from block to block

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you encounter any problems or have any questions, please open an issue on the GitHub repository.

FAQs

What is route-tts?

Is route-tts well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

route-tts

RouteTTS

Features

Installation

Usage

API Keys for Speech Providers

Creating Voices

OpenAI

ElevenLabs

TTS Instantiation

Creating Audio

Single Audio SpeechBlock

Multiple SpeechBlock

Tests

Feature List

Contributing

License

Support

Related posts

Typosquatting Cryptographic Libraries: Malicious npm Packages Threaten Crypto Developers with Keylogging and Wallet Theft

Weekly Downloads Now Available in npm Package Search Results