KugelAudio Python SDK
Official Python SDK for the KugelAudio Text-to-Speech API.
Installation
pip install kugelaudio
Or with uv:
uv add kugelaudio
Quick Start
from kugelaudio import KugelAudio
client = KugelAudio(api_key="your_api_key")
audio = client.tts.generate(
text="Hello, world!",
model_id="kugel-1-turbo",
)
audio.save("output.wav")
Client Configuration
from kugelaudio import KugelAudio
client = KugelAudio(api_key="your_api_key")
client = KugelAudio(
api_key="your_api_key",
api_url="https://api.kugelaudio.com",
timeout=60.0,
)
Region Selection
By default, KugelAudio uses the canonical geo-routed API endpoint. You can
select the direct EU endpoint when you need to pin traffic to Europe.
| default | api.kugelaudio.com (geo-routed) |
eu | api.eu.kugelaudio.com |
Option 1 — API key prefix (simplest, works with env vars):
client = KugelAudio(api_key="eu-ka_your_api_key")
client = KugelAudio(api_key="ka_your_api_key")
Option 2 — region parameter:
client = KugelAudio(api_key="ka_your_api_key", region="eu")
The prefix is always stripped before authentication. Priority: api_url > region > key prefix > default.
Single URL Architecture
The SDK uses a single URL for both REST API and WebSocket streaming. The TTS server provides both REST endpoints (/v1/models, /v1/voices) and WebSocket (/ws/tts) - no proxy needed, minimal latency.
Local Development
For local development, point directly to your TTS server:
client = KugelAudio(
api_key="your_api_key",
api_url="http://localhost:8000",
)
Or if you have separate backend and TTS servers:
client = KugelAudio(
api_key="your_api_key",
api_url="http://localhost:8001",
tts_url="http://localhost:8000",
)
Available Models
kugel-1-turbo | Kugel 1 Turbo | Fast, low-latency model for real-time applications |
kugel-1 | Kugel 1 | Premium quality model for pre-recorded content |
List Available Models
models = client.models.list()
for model in models:
print(f"{model.id}: {model.name}")
print(f" Description: {model.description}")
print(f" Max Input: {model.max_input_length} characters")
print(f" Sample Rate: {model.sample_rate} Hz")
Voices
List Available Voices
result = client.voices.list()
for voice in result.voices:
print(f"{voice.id}: {voice.name}")
print(f" Category: {voice.category}")
print(f" Languages: {', '.join(voice.supported_languages)}")
print(f"Showing {len(result.voices)} of {result.total} voices")
result = client.voices.list(language="de")
result = client.voices.list(include_public=True)
page1 = client.voices.list(limit=10, offset=0)
page2 = client.voices.list(limit=10, offset=10)
Get a Specific Voice
voice = client.voices.get(voice_id=123)
print(f"Voice: {voice.name}")
print(f"Sample text: {voice.sample_text}")
Text-to-Speech Generation
Basic Generation (Non-Streaming)
Generate complete audio and receive it all at once:
audio = client.tts.generate(
text="Hello, this is a test of the KugelAudio text-to-speech system.",
model_id="kugel-1-turbo",
voice_id=123,
cfg_scale=2.0,
max_new_tokens=2048,
sample_rate=24000,
normalize=True,
language="en",
)
print(f"Duration: {audio.duration_seconds:.2f}s")
print(f"Samples: {audio.samples}")
print(f"Sample rate: {audio.sample_rate} Hz")
print(f"Generation time: {audio.generation_ms:.0f}ms")
print(f"RTF: {audio.rtf:.2f}")
audio.save("output.wav")
pcm_data = audio.audio
wav_bytes = audio.to_wav_bytes()
Streaming Audio Output
Receive audio chunks as they are generated for lower latency:
for item in client.tts.stream(
text="Hello, this is streaming audio.",
model_id="kugel-1-turbo",
):
if hasattr(item, 'audio'):
print(f"Chunk {item.index}: {len(item.audio)} bytes, {item.samples} samples")
elif isinstance(item, dict) and item.get('final'):
print(f"Total duration: {item.get('dur_ms', 0):.0f}ms")
print(f"Generation time: {item.get('gen_ms', 0):.0f}ms")
Async Streaming
For async applications:
import asyncio
async def generate_speech():
async for item in client.tts.stream_async(
text="Async streaming example.",
model_id="kugel-1-turbo",
):
if hasattr(item, 'audio'):
pass
asyncio.run(generate_speech())
Async Generation
import asyncio
async def main():
audio = await client.tts.generate_async(
text="Async generation example.",
model_id="kugel-1-turbo",
)
audio.save("async_output.wav")
asyncio.run(main())
Text Normalization
Text normalization converts numbers, dates, times, and other non-verbal text into spoken words. For example:
- "I have 3 apples" → "I have three apples"
- "The meeting is at 2:30 PM" → "The meeting is at two thirty PM"
- "€50.99" → "fifty euros and ninety-nine cents"
Usage
audio = client.tts.generate(
text="I bought 3 items for €50.99 on 01/15/2024.",
normalize=True,
language="en",
)
audio = client.tts.generate(
text="Ich habe 3 Artikel für 50,99€ gekauft.",
normalize=True,
)
Supported Languages
de | German | nl | Dutch |
en | English | pl | Polish |
fr | French | sv | Swedish |
es | Spanish | da | Danish |
it | Italian | no | Norwegian |
pt | Portuguese | fi | Finnish |
cs | Czech | hu | Hungarian |
ro | Romanian | el | Greek |
uk | Ukrainian | bg | Bulgarian |
tr | Turkish | vi | Vietnamese |
ar | Arabic | hi | Hindi |
zh | Chinese | ja | Japanese |
ko | Korean | | |
Performance Warning
⚠️ Latency Warning: Using normalize=True without specifying language adds approximately 150ms latency for language auto-detection. For best performance in latency-sensitive applications, always specify the language parameter.
LLM Integration: Streaming Text Input
For real-time TTS when streaming text from an LLM (GPT-4, Claude, etc.),
use a StreamingSession. Forward LLM tokens directly to session.send()
without flush=True — the server accumulates them and starts
generation at natural sentence boundaries. Flush exactly once at the end
of the assistant turn.
⚠️ Do not call session.send(text, flush=True) between sentences or
words. Each explicit flush is a separate TTS request that pays the
full model time-to-first-audio (TTFA) again and produces an audible
gap. See Streaming best practices
for the full rationale and ElevenLabs migration notes.
Async Streaming Session
import asyncio
async def speak_turn(llm_token_stream):
async with client.tts.streaming_session(
voice_id=123,
model_id="kugel-1-turbo",
language="en",
) as session:
async for token in llm_token_stream:
async for chunk in session.send(token):
play_audio(chunk.audio)
async for chunk in session.flush():
play_audio(chunk.audio)
asyncio.run(speak_turn(my_llm_stream()))
Synchronous Streaming Session
with client.tts.streaming_session_sync(
voice_id=123,
model_id="kugel-1-turbo",
language="en",
) as session:
for token in llm_token_stream:
for chunk in session.send(token):
play_audio(chunk.audio)
for chunk in session.flush():
play_audio(chunk.audio)
Error Handling
from kugelaudio import KugelAudio
from kugelaudio.exceptions import (
KugelAudioError,
AuthenticationError,
RateLimitError,
InsufficientCreditsError,
ValidationError,
NotFoundError,
)
try:
audio = client.tts.generate(text="Hello!")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded, please wait")
except InsufficientCreditsError:
print("Not enough credits, please top up")
except ValidationError as e:
print(f"Invalid request: {e}")
except NotFoundError as e:
print(f"Resource not found (e.g. unknown voice_id): {e}")
except KugelAudioError as e:
print(f"API error: {e}")
Data Models
AudioChunk
Represents a single audio chunk from streaming:
class AudioChunk:
audio: bytes
encoding: str
index: int
sample_rate: int
samples: int
@property
def duration_seconds(self) -> float:
"""Duration of this chunk in seconds."""
AudioResponse
Complete audio response from generation:
class AudioResponse:
audio: bytes
sample_rate: int
samples: int
duration_ms: float
generation_ms: float
rtf: float
@property
def duration_seconds(self) -> float:
"""Duration in seconds."""
def save(self, path: str) -> None:
"""Save as WAV file."""
def to_wav_bytes(self) -> bytes:
"""Get WAV file as bytes."""
Model
TTS model information:
class Model:
id: str
name: str
description: str
max_input_length: int
sample_rate: int
Voice
Voice information:
class Voice:
id: int
name: str
description: Optional[str]
category: Optional[VoiceCategory]
sex: Optional[VoiceSex]
age: Optional[VoiceAge]
supported_languages: List[str]
sample_text: Optional[str]
avatar_url: Optional[str]
sample_url: Optional[str]
is_public: bool
verified: bool
Complete Example
from kugelaudio import KugelAudio
client = KugelAudio(api_key="your_api_key")
print("Available Models:")
for model in client.models.list():
print(f" - {model.id}: {model.name}")
print("\nAvailable Voices:")
for voice in client.voices.list(limit=5).voices:
print(f" - {voice.id}: {voice.name}")
print("\nGenerating audio...")
audio = client.tts.generate(
text="Welcome to KugelAudio. This is an example of high-quality text-to-speech synthesis.",
model_id="kugel-1-turbo",
)
print(f"Generated {audio.duration_seconds:.2f}s of audio in {audio.generation_ms:.0f}ms")
print(f"Real-time factor: {audio.rtf:.2f}x")
audio.save("example.wav")
print("Saved to example.wav")
client.close()
License
MIT