π§ AI Audiobook Maker (AIABM) v5.1.0

Transform your PDFs and text files into high-quality audiobooks using OpenAI TTS (cloud) or Thorsten-Voice (native German). Choose between premium cloud voices or run everything locally at no cost!
π New in v5.1: Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.
β¨ Features
ποΈ Dual TTS Providers
- βοΈ OpenAI TTS: Premium cloud voices with 6 voice options (requires API key)
- π©πͺ Thorsten-Voice: Native German TTS with authentic pronunciation (local/free)
π Core Features
- π Zero Installation: Run directly with
npx aiabm
- π Smart File Handling: Supports PDF and TXT files with drag & drop
- π€ Voice Preview: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
- π Enhanced Security: Input sanitization, API key validation, and secure storage
- π§ͺ Comprehensive Testing: 55+ unit tests with 12.6% coverage and growing
- βΈοΈ Resume & Pause: Continue interrupted conversions anytime
- π Secure API Key Management: Encrypted local storage
- π Progress Tracking: Real-time conversion progress with estimates
- ποΈ Advanced Controls: Adjust speed, quality, and output format
- π° Cost Transparency: See exact pricing (OpenAI) or run free (local providers)
- π§ Smart Installation: Automatic setup for local TTS providers
π Quick Start
Method 1: Direct Usage (Recommended)
npx aiabm mybook.pdf
npx aiabm
Method 2: Global Installation
npm install -g aiabm
aiabm mybook.pdf
π Prerequisites
Required
- Node.js 16+ (Download from nodejs.org)
- FFmpeg (for audio combining - auto-installed on most systems)
Optional (Choose One or Both)
For OpenAI TTS:
For Thorsten-Voice (German TTS):
- Python 3.9-3.11 (auto-installed)
- Coqui TTS (auto-installed)
- Completely FREE - runs locally
π― Usage Examples
CLI Mode
npx aiabm document.pdf
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd
npx aiabm --config
Interactive Mode
npx aiabm
Then follow the interactive prompts to:
- Select TTS Provider (OpenAI, Fish Speech, or Thorsten-Voice)
- Auto-install local providers if needed (one-time setup)
- Select your file (browse, drag & drop, or enter path)
- Preview and choose a voice
- Configure settings (speed, quality, output format)
- Monitor progress and resume if needed
π€ Available Voices
π€ OpenAI TTS (Cloud)
- Alloy: Neutral, versatile
- Echo: Clear, professional
- Fable: Warm, storytelling
- Onyx: Deep, authoritative
- Nova: Bright, engaging
- Shimmer: Gentle, soothing
π Fish Speech (Local/Multilingual)
- π©πͺ German Female (Natural): High-quality German synthesis
- π©πͺ German Male (Clear): Professional German voice
- π©πͺ German Female (Expressive): Emotional German narration
- πΊπΈ English Female (Warm): Natural English voice
- πΊπΈ English Male (Professional): Business-quality English
- πΊπΈ English Female (Energetic): Dynamic storytelling
- π«π· French Female (Elegant): Sophisticated French accent
- π«π· French Male (Sophisticated): Professional French voice
π©πͺ Thorsten-Voice (Native German)
- π©πͺ Thorsten (Authentic German Male): High-quality native German voice
- π©πͺ Thorsten Emotional (German Male): German voice with emotional expression
π° Pricing
OpenAI TTS
$0.015 per 1,000 characters
10,000 characters | ~$0.15 | Short article |
50,000 characters | ~$0.75 | Small e-book |
100,000 characters | ~$1.50 | Average novel |
250,000 characters | ~$3.75 | Large book |
Fish Speech & Thorsten-Voice
100% FREE - No API costs, runs entirely on your machine!
π§ Local TTS Setup
Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! Now with fully automated installation!
π Smart Installation (Recommended)
npx aiabm
π Fish Speech Setup
What happens automatically:
- π¦ Repository Cloning - Downloads latest Fish Speech
- π Virtual Environment - Creates isolated Python environment
- β‘ PyTorch Installation - Installs optimized CPU version
- π€ Model Download - Downloads Fish Speech 1.2 models (~1GB)
- β
Dependency Check - Verifies installation works
System Requirements:
- Python 3.8+ recommended
- ~2GB disk space for models and dependencies
- 4GB+ RAM recommended
- CPU or GPU (GPU faster but optional)
π©πͺ Thorsten-Voice Setup
What happens automatically:
- π Compatible Python Detection - Finds Python 3.9-3.11
- π¦ Virtual Environment - Creates isolated environment
- π€ Coqui TTS Installation - Installs German TTS framework
- π€ Thorsten Model - Downloads German voice model (~500MB)
- β
Compatibility Check - Verifies everything works
System Requirements:
- Python 3.9-3.11 (NOT 3.12+, NOT 3.13+)
- ~1GB disk space for models and dependencies
- 2GB+ RAM recommended
Python Version Issues?
brew install python@3.11
sudo apt install python3.11 python3.11-venv
π§ Installation Status Tracking
- β
Smart Detection: Avoids re-installation if already installed
- π
Version Tracking: Shows installation date and version
- π Update Suggestions: Recommends updates after 30+ days
- π οΈ Installation Markers: Persistent installation state
π§ Advanced Features
Resume Interrupted Conversions
If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.
Multiple Output Formats
- Single File: One complete audiobook MP3
- Chapter Files: Separate MP3 per chunk
- Both: Get both formats
Voice Preview Caching
Voice previews are cached locally to save API costs and improve performance.
Smart Text Chunking
- Respects sentence boundaries
- Preserves chapter structure for PDFs
- Configurable chunk sizes (default: 4000 characters)
π File Support
PDF Files
- β
Up to 50MB
- β
Text extraction with structure preservation
- β
Automatic chapter detection
Text Files
- β
Up to 1M characters
- β
UTF-8 encoding
- β
Automatic formatting cleanup
π What's New in v5.0
π Enhanced Security
- Input Sanitization: Prevents code injection and malicious input
- API Key Validation: Comprehensive security checks for OpenAI keys
- Secure Storage: Encrypted API key storage with multiple layers
- Environment Assessment: Automatic security environment analysis
π§ͺ Comprehensive Testing
- 55+ Unit Tests: Extensive test coverage for core functionality
- 12.6% Code Coverage: Growing test suite with focus on critical paths
- Mocked Services: Fast, reliable tests without external dependencies
- CI/CD Pipeline: Automated testing on every commit
π‘οΈ Better Error Handling
- Type-Safe Validation: Zod schemas for all configuration and data
- Graceful Failures: Better error messages and recovery mechanisms
- Logging & Monitoring: Detailed error tracking and user feedback
π― Developer Experience
- GitHub Actions: Automated CI/CD with security auditing
- ESLint Clean: Zero linting errors with consistent code style
- Documentation: Comprehensive inline documentation and examples
βοΈ Configuration
API Key Storage
Your OpenAI API key is encrypted and stored locally at:
- macOS/Linux:
~/.config/ai-audiobook-maker/config.json
- Windows:
%APPDATA%\ai-audiobook-maker\config.json
Cache Location
Voice previews and temporary files:
- macOS/Linux:
~/.config/ai-audiobook-maker/cache/
- Windows:
%APPDATA%\ai-audiobook-maker\cache\
Local TTS Installations
Local TTS providers are installed to:
- Fish Speech:
~/.aiabm/fish-speech/
- Thorsten-Voice:
~/.aiabm/thorsten-voice/
π οΈ Troubleshooting
Common Issues
"FFmpeg not found"
brew install ffmpeg
sudo apt install ffmpeg
"API key invalid"
- Verify your key at OpenAI Platform
- Use
npx aiabm --config
to update your key
"File too large"
- PDFs: Maximum 50MB
- Text: Maximum 1M characters
- Split large files before conversion
"Fish Speech dependencies missing"
- Check Python version:
python3 --version
- Try restarting the app
- Virtual environment issues usually resolve on restart
"Thorsten-Voice requires Python 3.9-3.11"
- Install compatible Python:
brew install python@3.11
- App will automatically detect and use it
- Creates separate virtual environment
Voice preview not playing
- macOS: Uses built-in
afplay
- Windows: Uses PowerShell media player
- Linux: Requires
ffplay
, mpv
, vlc
, or mplayer
Performance Tips
- Use
tts-1
model for faster processing
- Use
tts-1-hd
for higher quality (slower)
- Local TTS providers are free but slower than cloud
- Cache clears automatically after 30 days
- Resume feature prevents re-processing completed chunks
π Privacy & Security
- API keys are encrypted locally using AES-192
- No data is sent to servers when using local TTS
- OpenAI TTS sends only text chunks to OpenAI servers
- Cache files are stored locally only
- Session data helps resume interrupted conversions
- Local TTS models run entirely offline
π Examples
Converting a PDF Book with German Voice
npx aiabm "Mein Roman.pdf"
Interactive Multilingual Setup
npx aiabm
Quick OpenAI Conversion
npx aiabm document.pdf --voice nova --speed 1.1
π€ Contributing
Issues and feature requests welcome at: GitHub Issues
π License
MIT License - see LICENSE file for details
π Acknowledgments
π Changelog
v4.0.7 (2025-08-03) - π Fish Speech Fully Fixed & Operational
- π Fish Speech 100% Working - Complete resolution of all Fish Speech TTS issues
- π§ Fixed tokenizer.tiktoken - Proper base64 encoding of 32,000 tokens from Fish Speech
- βοΈ Model Configuration Fixed - Created correct firefly_gan_vq.yaml matching model architecture
- π Dimension Mismatch Resolved - Fixed 512-dim vs 1024-dim PyTorch tensor issues
- β
Parameter Validation Fixed - Corrected ServeTTSRequest use_memory_cache format
- π― End-to-End Functionality - Text-to-semantic and decoder models load perfectly
- π Full Service Availability - Fish Speech now detected as available and operational
v4.0.6 (2025-08-03) - π§ͺ Comprehensive Test Coverage & TTS Fixes
- π§ͺ Major Test Coverage Improvement - 20% to 45.07% overall coverage (+125% improvement)
- π― AudiobookMaker.js Tests - 0% to 42.58% coverage with integration tests
- π ConfigManager.js Tests - 0% to 98.03% coverage with security tests
- π FileHandler.js Tests - 0% to 72.99% coverage with core functionality tests
- π₯οΈ cli.js Tests - 0% to 75.75% coverage with end-to-end tests
- π Fish Speech Fixed - Installation detection and availability checking
- π©πͺ Thorsten Voice Fixed - Python 3.13 compatibility and installation issues
- π 207 Total Tests - 195 passing with comprehensive edge case coverage
- π§ Integration Tests - Real-world testing with actual TTS services and PDF processing
- π‘οΈ Robust Error Handling - Enhanced service availability validation
v4.0.5 (2025-08-03) - π΅ Unified Preview System
- π΅ Unified Preview Texts - Consistent voice previews across all TTS providers
- π Language-Specific Previews - German, English, and French preview texts
- πΎ Smart Caching - Consistent cache filenames prevent preview regeneration
- π― Voice Language Detection - Automatic language detection from voice names
- π Cache Optimization - Separate preview cache directories for each provider
- βοΈ Better Performance - No more regenerating previews when switching providers
v4.0.4 (2025-08-03) - π οΈ Fish Speech Engine Fix
- π§ Fixed TTSInferenceEngine initialization - Use proper ModelManager pattern
- ποΈ Implemented correct model loading - Load LLaMA and DAC models separately
- π― Auto-device detection - Support for MPS (Apple Silicon), CUDA, and CPU
- π¦ Better model management - Use launch_thread_safe_queue for text-to-semantic
- π Improved generation flow - Proper model initialization before inference
v4.0.3 (2025-08-03) - π§ Fish Speech Import Fix
- π§ Fixed MODDED_DAC import - Changed to correct DAC import from inference_engine
- β
Added missing torch import - Fixed undefined torch reference in generation script
- π οΈ Simplified dependency check - Import DAC directly from inference_engine
- π¦ Better module verification - Check ServeTTSRequest schema availability
v4.0.2 (2025-08-03) - π Fish Speech API Update
- π§ Fixed Fish Speech dependency check - Updated to use current DAC-based architecture
- ποΈ Removed deprecated VQGAN imports - Fish Speech now uses DAC (Descript Audio Codec)
- β
Updated generation script - Uses modern TTSInferenceEngine API
- π Better installation handling - Auto-removes incomplete installations
- π¦ Improved pip install - Installs Fish Speech package in development mode
- π οΈ Enhanced error reporting - More detailed debugging information
v4.0.1 (2025-08-02) - π§ Installation & Compatibility Fixes
- π§ Fixed Fish Speech virtual environment usage - Proper dependency checking
- π Enhanced Python version detection - Blocks Thorsten-Voice on Python 3.13+
- β
Smart installation status tracking - Avoids unnecessary re-installations
- π
Installation markers - Persistent installation state with version info
- π Better error handling - More informative error messages and recovery
- π‘ Improved user guidance - Clear instructions for Python compatibility issues
v4.0.0 (2025-08-02) - π Major Refactoring
- ποΈ REMOVED: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
- π NEW: Fish Speech integration - State-of-the-art multilingual TTS
- π©πͺ NEW: Thorsten-Voice integration - Native German TTS
- π€ Enhanced Voice Selection: 16 total voices across 3 providers
- ποΈ Automated Installation: One-click setup for local TTS providers
- π§ Improved Architecture: Better service abstraction and error handling
- π Enhanced Testing: 80%+ test coverage with Jest
- π οΈ Code Quality Tools: ESLint, Prettier, Snyk integration
- π Backward Compatibility: 100% compatibility with existing OpenAI workflows
v3.3.0 (2025-08-01) - π Kyutai Integration (Deprecated)
- π Kyutai TTS integration (now removed in v4.0.0)
- ποΈ Automated installation system
- π€ 15+ voice options
- π Provider selection system
Happy listening! π§ Turn any text into your personal audiobook library with the best TTS technology available.