You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

kalakan-tts

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

kalakan-tts

A fully custom, offline Text-to-Speech system for the Twi language

1.0.1

PyPI

Maintainers: 1

Kalakan TTS

A fully custom, offline Text-to-Speech system for the Twi language.

Features

Specialized for Twi language with proper handling of special characters (ɛ, ɔ, Ɔ, tonal markers)
Complete text processing pipeline with Twi-specific tokenization and phoneme mapping
High-quality acoustic models:
- Tacotron2: Attention-based sequence-to-sequence model
- FastSpeech2: Non-autoregressive model with explicit duration modeling
- Transformer-TTS: Transformer-based sequence-to-sequence model
Multiple vocoders:
- Griffin-Lim: Simple phase reconstruction algorithm
- HiFi-GAN: High-fidelity GAN-based vocoder
- MelGAN: Fast GAN-based vocoder
- WaveGlow: Flow-based generative model
Production-ready API with REST and gRPC interfaces
Comprehensive training infrastructure with experiment tracking and data augmentation
Advanced dataset preparation with metadata generation, phoneme conversion, and quality control
Optimized for both cloud and edge deployment

Installation

Basic Installation

pip install kalakan-tts

Development Installation

git clone https://github.com/kalakan-ai/kalakan-tts.git
cd kalakan-tts
pip install -e ".[dev,api,training]"

Quick Start

Text-to-Speech Synthesis

from kalakan.synthesis.synthesizer import Synthesizer

# Initialize the synthesizer with specific models
synthesizer = Synthesizer(
    acoustic_model="/path/to/acoustic_model.pt",  # or use a model name like "tacotron2", "fastspeech2", "transformer_tts"
    vocoder="/path/to/vocoder.pt",  # or use a model name like "griffin_lim", "hifigan", "melgan", "waveglow"
    device="cuda:0",  # or "cpu"
)

# Generate audio from Twi text
audio = synthesizer.synthesize(
    text="Akwaaba! Wo ho te sɛn?",
    normalize=True,
    clean=True,
    speed=1.0,  # Control speech speed (for FastSpeech2)
    pitch=1.0,  # Control pitch (for FastSpeech2)
    energy=1.0,  # Control energy/volume (for FastSpeech2)
)

# Save the audio to a file
synthesizer.save_audio(audio, "output.wav")

Command Line Interface

# Using the demo script
python demo.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt

# Start the REST API server
python -m kalakan.api.server --host 0.0.0.0 --port 8000 --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt

# Start the gRPC API server
python -m kalakan.api.grpc_api --host 0.0.0.0 --port 50051 --acoustic_model /path/to/acoustic_model.pt --vocoder /path/to/vocoder.pt

API Clients

# REST API client
python rest_client.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --host localhost --port 8000

# gRPC API client
python grpc_client.py --text "Akwaaba! Wo ho te sɛn?" --output output.wav --host localhost --port 50051

Dataset Preparation

Kalakan TTS includes a comprehensive metadata generation tool for preparing TTS datasets:

Basic Metadata Generation

# Generate basic metadata for a dataset
kalakan gen-metadata --input-dir /path/to/dataset --verbose

Advanced Features

# Generate metadata with phonemes, dataset splitting, and quality control
kalakan gen-metadata \
    --input-dir /path/to/dataset \
    --generate-phonemes \
    --split-dataset \
    --val-ratio 0.2 \
    --test-ratio 0.1 \
    --output-format both \
    --include-stats \
    --check-duplicates \
    --remove-duplicates \
    --normalize-text \
    --clean-text \
    --speaker-id "speaker_001" \
    --verbose

Key Features

Phoneme Generation: Automatic G2P conversion for Twi text
Text Normalization: Cleaning and standardization of text content
Dataset Splitting: Automatic train/validation/test splits
Quality Control: Duration filtering, duplicate detection, and validation
Multiple Formats: CSV and JSON output support
Comprehensive Statistics: Detailed dataset analytics

For detailed documentation, see docs/metadata_generation_guide.md.

Text Normalization

Kalakan TTS includes a powerful text normalization tool for preparing Twi text:

Basic Text Normalization

# Normalize Twi text directly
kalakan norm --text "Dr. Kwame na Prof. Ama bɛba ha 25 mu."

File Processing

# Normalize text from files with various options
kalakan norm \
    --file input.txt \
    --output normalized.txt \
    --format json \
    --show-diff \
    --verbose

Key Features

Abbreviation Expansion: "Dr." → "doctor", "Prof." → "professor"
Number Conversion: "25" → text representation
Special Character Normalization: Proper handling of Twi characters (ɛ, ɔ)
Multiple Output Formats: Text, JSON, CSV
Batch Processing: Process multiple files at once
Interactive File Selection: GUI file picker
Difference Visualization: See what changed during normalization

Documentation

For detailed documentation, visit docs/.

License

Apache License 2.0

Keywords

FAQs

What is kalakan-tts?

Is kalakan-tts well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

kalakan-tts

Kalakan TTS

Features

Installation

Basic Installation

Development Installation

Quick Start

Text-to-Speech Synthesis

Command Line Interface

API Clients

Dataset Preparation

Basic Metadata Generation

Advanced Features

Key Features

Text Normalization

Basic Text Normalization

File Processing

Key Features

Documentation

License

Keywords

Related posts

New CNA Scorecard Tool Ranks CVE Data Quality Across the Ecosystem

Malicious npm Packages Target WhatsApp Developers with Remote Kill Switch