
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
TTS-Wrapper makes it easier to use text-to-speech APIs by providing a unified and easy-to-use interface.
Contributions are welcome! Check our contribution guide.
TTS-Wrapper simplifies using text-to-speech APIs by providing a unified interface across multiple services, allowing easy integration and manipulation of TTS capabilities.
âšī¸ Full documentation is available at https://willwade.github.io/tts-wrapper/
Engine | Platform | Online/Offline | SSML | Word Boundaries | Streaming | Playback Control | Callbacks |
---|---|---|---|---|---|---|---|
Polly | Linux/MacOS/Windows | Online | Yes | Yes | Yes | Yes | Full |
Linux/MacOS/Windows | Online | Yes | Yes | Yes | Yes | Full | |
GoogleTrans | Linux/MacOS/Windows | Online | No* | No** | Yes | Yes | Basic |
Microsoft | Linux/MacOS/Windows | Online | Yes | Yes | Yes | Yes | Full |
Watson | Linux/MacOS/Windows | Online | Yes | Yes | Yes | Yes | Full |
ElevenLabs | Linux/MacOS/Windows | Online | No* | Yes | Yes | Yes | Full |
Play.HT | Linux/MacOS/Windows | Online | No* | No** | Yes | Yes | Basic |
OpenAI | Linux/MacOS/Windows | Online | No | No | Yes | Yes | Basic |
Wit.Ai | Linux/MacOS/Windows | Online | No* | No** | Yes | Yes | Basic |
eSpeak | Linux/MacOS | Offline | Yes | No** | Yes | Yes | Basic |
AVSynth | MacOS | Offline | No | No** | Yes | Yes | Basic |
SAPI | Windows | Offline | Yes | Yes | Yes | Yes | Full |
UWP | Windows | Offline | Yes | Yes | Yes | Yes | Full |
Sherpa-ONNX | Linux/MacOS/Windows | Offline | No | No** | Yes | Yes | Basic |
Notes:
set_voice
)Method | Description | Availability |
---|---|---|
speak() | Direct speech playback | All engines |
speak_streamed() | Streamed speech playback | All engines |
synth_to_file() | Save speech to file | All engines |
pause() , resume() | Playback control | All engines |
stop() | Stop playback | All engines |
set_property() | Control rate/volume/pitch | All engines |
get_voices() | List available voices | All engines |
set_voice() | Select voice | All engines |
connect() | Register event callbacks | All engines |
check_credentials() | Verify API credentials | Online engines |
set_output_device() | Select audio output device | All engines |
This package is published on PyPI as py3-tts-wrapper
but installs as tts-wrapper
. This is because it's a fork of the original tts-wrapper
project with Python 3 support and additional features.
This project requires the following system dependencies on Linux:
sudo apt-get install portaudio19-dev
or MacOS, using Homebrew
brew install portaudio
For PicoTTS on Debian systems:
sudo apt-get install libttspico-utils
The espeak
TTS functionality requires the espeak-ng
C library to be installed on your system:
sudo apt install espeak-ng
brew install espeak-ng
Install from PyPI with selected engines:
pip install "py3-tts-wrapper[google,microsoft,sapi,sherpaonnx,googletrans]"
Install from GitHub:
pip install "py3-tts-wrapper[google,microsoft,sapi,sherpaonnx,googletrans]@git+https://github.com/willwade/tts-wrapper"
Note: On macOS/zsh, you may need to use quotes:
pip install "py3-tts-wrapper[google,watson,polly,elevenlabs,microsoft,sherpaonnx]"
from tts_wrapper import PollyClient
# Initialize the client - it's also the TTS engine
client = PollyClient(credentials=('aws_key_id', 'aws_secret_access_key'))
ssml_text = client.ssml.add('Hello, <break time="500ms"/> world!')
client.speak(ssml_text)
You can use SSML or plain text
from tts_wrapper import PollyClient
# Initialize the client - it's also the TTS engine
client = PollyClient(credentials=('aws_key_id', 'aws_secret_access_key'))
client.speak('Hello world')
For a full demo see the examples folder. You'll need to fill out the credentials.json (or credentials-private.json). Use them from cd'ing into the examples folder. Tips on gaining keys are below.
Each service uses different methods for authentication:
from tts_wrapper import PollyClient
client = PollyClient(credentials=('aws_region','aws_key_id', 'aws_secret_access_key'))
from tts_wrapper import GoogleClient
client = GoogleClient(credentials=('path/to/creds.json'))
or pass the auth file as dict - so in memory
from tts_wrapper import GoogleClient
import json
import os
with open(os.getenv("GOOGLE_SA_PATH"), "r") as file:
credentials_dict = json.load(file)
client = GoogleClient(credentials=os.getenv('GOOGLE_SA_PATH'))
# Or use the dictionary directly
client = GoogleClient(credentials=credentials_dict)
from tts_wrapper import MicrosoftTTS
tts = MicrosoftTTS(credentials=('subscription_key', 'subscription_region'))
tts.set_voice('voice_id')
from tts_wrapper import WatsonClient
client = WatsonClient(credentials=('api_key', 'region', 'instance_id'))
Note If you have issues with SSL certification try
from tts_wrapper import WatsonClient
client = WatsonClient(credentials=('api_key', 'region', 'instance_id'),disableSSLVerification=True)
from tts_wrapper import ElevenLabsClient
client = ElevenLabsClient(credentials=('api_key'))
from tts_wrapper import WitAiClient
client = WitAiClient(credentials=('token'))
from tts_wrapper import PlayHTClient
client = PlayHTClient(credentials=('api_key', 'user_id'))
from tts_wrapper import UWPClient
client = UWPClient()
from tts_wrapper import eSpeakClient
client = eSpeakClient()
Note: Requires espeak-ng to be installed on your system.
from tts_wrapper import SAPIClient
client = SAPIClient()
Note: Only available on Windows systems.
from tts_wrapper import AVSynthClient
client = AVSynthClient()
Note: Only available on macOS. Provides high-quality speech synthesis with word timing support and voice property control.
Uses the gTTS library for free text-to-speech via Google Translate.
from tts_wrapper import GoogleTransClient
# Initialize with default voice (UK English)
tts = GoogleTransClient()
# Or specify a voice/language
tts = GoogleTransClient(voice_id="en-co.uk")
# Set voice after initialization
tts.set_voice("fr-fr") # French
You can provide blank model path and tokens path - and we will use a default location..
from tts_wrapper import SherpaOnnxClient
client = SherpaOnnxClient(model_path=None, tokens_path=None)
Set a voice like
# Find voices/langs availables
voices = client.get_voices()
print("Available voices:", voices)
# Set the voice using ISO code
iso_code = "eng" # Example ISO code for the voice - also ID in voice details
client.set_voice(iso_code)
and then use speak, speak_streamed etc..
You then can perform the following methods.
Even if you don't use SSML features that much its wise to use the same syntax - so pass SSML not text to all engines
ssml_text = client.ssml.add('Hello world!')
If you want to keep things simple each engine will convert plain text to SSML if its not.
client.speak('Hello World!')
This will use the default audio output of your device to play the audio immediately
client.speak(ssml_text)
This will check if the credentials are valid:
tts = MicrosoftTTS(
credentials=(os.getenv("MICROSOFT_TOKEN"), os.getenv("MICROSOFT_REGION"))
)
if tts.check_credentials():
print("Credentials are valid.")
else:
print("Credentials are invalid.")
NB: Each engine has a different way of checking credentials. If they don't have a specific implementation, the parent class will check get_voices. If you want to save API calls, you can just do a get_voices call directly.
pause_audio()
, resume_audio()
, stop_audio()
These methods manage audio playback by pausing, resuming, or stopping it. NB: Only to be used for speak_streamed
You need to make sure the optional dependency is included for this
pip install py3-tts-wrapper[controlaudio,google.. etc
then
client = GoogleClient(credentials="path/to/credentials.json")
try:
text = "This is a pause and resume test. The text will be longer, depending on where the pause and resume works"
audio_bytes = client.synth_to_bytes(text)
client.load_audio(audio_bytes)
print("Play audio for 3 seconds")
client.play(1)
client.pause(8)
client.resume()
time.sleep(6)
finally:
client.cleanup()
NB: to do this we use pyaudio. If you have issues with this you may need to install portaudio19-dev - particularly on linux
sudo apt-get install portaudio19-dev
client.synth_to_file(ssml_text, 'output.mp3', format='mp3')
there is also "synth" method which is legacy. Note we support saving as mp3, wav or flac.
client.synth('<speak>Hello, world!</speak>', 'hello.mp3', format='mp3')
Note you can also stream - and save. Just note it saves at the end of streaming entirely..
ssml_text = client.ssml.add('Hello world!')
client.speak_streamed(ssml_text, filepath, 'wav')
voices = client.get_voices()
print(voices)
NB: All voices will have a id, dict of language_codes, name and gender. Just note not all voice engines provide gender
client.set_voice(voice_id, lang_code="en-US")
e.g.
client.set_voice('en-US-JessaNeural', 'en-US')
Use the id - not a name
ssml_text = client.ssml.add('Hello, <break time="500ms"/> world!')
client.speak(ssml_text)
Set volume:
client.set_property("volume", "90")
text_read = f"The current volume is 90"
text_with_prosody = client.construct_prosody_tag(text_read)
ssml_text = client.ssml.add(text_with_prosody)
Set rate:
client.set_property("rate", "slow")
text_read = f"The current rate is SLOW"
text_with_prosody = client.construct_prosody_tag(text_read)
ssml_text = client.ssml.add(text_with_prosody)
Speech Rate:
Set pitch:
client.set_property("pitch", "high")
text_read = f"The current pitch is HIGH"
text_with_prosody = client.construct_prosody_tag(text_read)
ssml_text = client.ssml.add(text_with_prosody)
Pitch Control:
Use the client.ssml.clear_ssml()
method to clear all entries from the ssml list
set_property()
This method allows setting properties like rate
, volume
, and pitch
.
client.set_property("rate", "fast")
client.set_property("volume", "80")
client.set_property("pitch", "high")
get_property()
This method retrieves the value of properties such as volume
, rate
, or pitch
.
current_volume = client.get_property("volume")
print(f"Current volume: {current_volume}")
Note only Polly, Microsoft, Google, ElevenLabs, UWP, SAPI and Watson can do this correctly with precise timing from the TTS engine. All other engines (GoogleTrans, Wit.Ai, Play.HT, OpenAI, eSpeak, AVSynth, Sherpa-ONNX) use estimated timing based on text length and average speaking rate.
def my_callback(word: str, start_time: float, end_time: float):
duration = end_time - start_time
print(f"Word: {word}, Duration: {duration:.3f}s")
def on_start():
print('Speech started')
def on_end():
print('Speech ended')
try:
text = "Hello, This is a word timing test"
ssml_text = client.ssml.add(text)
client.connect('onStart', on_start)
client.connect('onEnd', on_end)
client.start_playback_with_callbacks(ssml_text, callback=my_callback)
except Exception as e:
print(f"Error: {e}")
and it will output
Speech started
Word: Hello, Duration: 0.612s
Word: , Duration: 0.212s
Word: This, Duration: 0.364s
Word: is, Duration: 0.310s
Word: a, Duration: 0.304s
Word: word, Duration: 0.412s
Word: timing, Duration: 0.396s
Word: test, Duration: 0.424s
Speech ended
connect()
This method allows registering callback functions for events like onStart
or onEnd
.
def on_start():
print("Speech started")
client.connect('onStart', on_start)
The wrapper provides several methods for audio output, each suited for different use cases:
The simplest method - plays audio immediately:
client.speak("Hello world")
Recommended for longer texts - streams audio as it's being synthesized:
client.speak_streamed("This is a long text that will be streamed as it's synthesized")
Save synthesized speech to a file:
client.synth_to_file("Hello world", "output.wav")
For advanced use cases where you need the raw audio data:
# Get raw PCM audio data as bytes
audio_bytes = client.synth_to_bytes("Hello world")
pydub
:
from pydub import AudioSegment
import io
# Get WAV data
audio_bytes = client.synth_to_bytes("Hello world")
# Convert to MP3
wav_audio = AudioSegment.from_wav(io.BytesIO(audio_bytes))
wav_audio.export("output.mp3", format="mp3")
You can use the synth_to_bytestream
method to synthesize audio in any supported format and save it directly to a file.
# Synthesize text into a bytestream in MP3 format
bytestream = client.synth_to_bytestream("Hello, this is a test", format="mp3")
# Save the audio bytestream to a file
with open("output.mp3", "wb") as f:
f.write(bytestream.read())
print("Audio saved to output.mp3")
Explanation:
BytesIO
object is then written to a file using the .read()
method of the BytesIO
class.sounddevice
If you want to play the synthesized audio live without saving it to a file, you can use the sounddevice
library to directly play the audio from the BytesIO
bytestream.
import sounddevice as sd
import numpy as np
# Synthesize text into a bytestream in WAV format
bytestream = client.synth_to_bytestream("Hello, this is a live playback test", format="wav")
# Convert the bytestream back to raw PCM audio data for playback
audio_data = np.frombuffer(bytestream.read(), dtype=np.int16)
# Play the audio using sounddevice
sd.play(audio_data, samplerate=client.audio_rate)
sd.wait()
print("Live playback completed")
Explanation:
wav
bytestream.np.frombuffer()
, which is then fed into the sounddevice
library for live playback.sd.play()
plays the audio in real-time, and sd.wait()
ensures that the program waits until playback finishes.For advanced use cases where you need direct control over audio playback, you can use the raw audio data methods:
from tts_wrapper import AVSynthClient
import numpy as np
import sounddevice as sd
# Initialize TTS client
client = AVSynthClient()
# Method 1: Direct playback of entire audio
def play_audio_stream(client, text: str):
"""Play entire audio at once."""
# Get raw audio data
audio_data = client.synth_to_bytes(text)
# Convert to numpy array for playback
samples = np.frombuffer(audio_data, dtype=np.int16)
# Play the audio
sd.play(samples, samplerate=client.audio_rate)
sd.wait()
# Method 2: Chunked playback for more control
def play_audio_chunked(client, text: str, chunk_size: int = 4096):
"""Process and play audio in chunks for more control."""
# Get raw audio data
audio_data = client.synth_to_bytes(text)
# Create a continuous stream
stream = sd.OutputStream(
samplerate=client.audio_rate,
channels=1, # Mono audio
dtype=np.int16
)
with stream:
# Process in chunks
for i in range(0, len(audio_data), chunk_size):
chunk = audio_data[i:i + chunk_size]
if len(chunk) % 2 != 0: # Ensure even size for 16-bit audio
chunk = chunk[:-1]
samples = np.frombuffer(chunk, dtype=np.int16)
stream.write(samples)
This manual control allows you to:
The chunked playback method is particularly useful for:
Note: Manual audio control requires the sounddevice
and numpy
packages:
pip install sounddevice numpy
Clone the repository:
git clone https://github.com/willwade/tts-wrapper.git
cd tts-wrapper
Install the package and system dependencies:
pip install .
To install optional dependencies, use:
pip install .[google, watson, polly, elevenlabs, microsoft]
This will install Python dependencies and system dependencies required for this project. Note that system dependencies will only be installed automatically on Linux.
pip install uv
Clone the repository:
git clone https://github.com/willwade/tts-wrapper.git
cd tts-wrapper
Install Python dependencies:
uv sync --all-extras
Install system dependencies (Linux only):
uv run postinstall
NOTE: to get a requirements.txt file for the project use uv export --format requirements-txt --all-extras --no-hashes
juat be warned that this will include all dependencies including dev ones.
git tag -a v0.1.0 -m "Release 0.1.0"
git push origin v0.1.0
This guide provides a step-by-step approach to adding a new engine to the existing Text-to-Speech (TTS) wrapper system.
Create a new folder for your engine within the engines
directory. Name this folder according to your engine, such as witai
for Wit.ai.
Directory structure:
engines/witai/
Create necessary files within this new folder:
__init__.py
- Makes the directory a Python package.client.py
- Handles all interactions with the TTS API and implements the AbstractTTS interface.ssml.py
- Defines any SSML handling specific to this engine (optional).Final directory setup:
engines/
âââ witai/
âââ __init__.py
âââ client.py
âââ ssml.py
client.py
Implement authentication and necessary setup for API connection. This file should manage tasks such as sending synthesis requests and fetching available voices. The client class should inherit from AbstractTTS.
from tts_wrapper.tts import AbstractTTS
class WitAiClient(AbstractTTS):
def __init__(self, credentials=None):
super().__init__()
self.token = credentials[0] if credentials else None
self.audio_rate = 24000 # Default sample rate for this engine
# Setup other necessary API connection details here
def _get_voices(self):
# Code to retrieve available voices from the TTS API
# Return raw voice data that will be processed by the base class
pass
def synth_to_bytes(self, text, voice_id=None):
# Code to send a synthesis request to the TTS API
# Return raw audio bytes
pass
def synth(self, text, output_file, output_format="wav", voice_id=None):
# Code to synthesize speech and save to a file
pass
If the engine has specific SSML requirements or supports certain SSML tags differently, implement this logic in ssml.py
.
from tts_wrapper.ssml import BaseSSMLRoot, SSMLNode
class WitAiSSML(BaseSSMLRoot):
def add_break(self, time='500ms'):
self.root.add(SSMLNode('break', attrs={'time': time}))
__init__.py
Make sure the __init__.py
file properly imports and exposes the client class.
from .client import WitAiClient
You can store your credentials in either:
credentials.json
- For developmentcredentials-private.json
- For private credentials (should be git-ignored)Example structure (do NOT commit actual credentials):
{
"Polly": {
"region": "your-region",
"aws_key_id": "your-key-id",
"aws_access_key": "your-access-key"
},
"Microsoft": {
"token": "your-subscription-key",
"region": "your-region"
}
}
This project is licensed under the MIT License.
FAQs
TTS-Wrapper makes it easier to use text-to-speech APIs by providing a unified and easy-to-use interface.
We found that py3-tts-wrapper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.