You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP β†’
Socket
Book a DemoInstallSign in
Socket

aiabm

Package Overview
Dependencies
Maintainers
1
Versions
19
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

aiabm

AI Audiobook Maker - Convert PDFs and text files to audiobooks using OpenAI TTS or Thorsten-Voice (native German)

5.1.1
latest
Source
npmnpm
Version published
Weekly downloads
490
-45.74%
Maintainers
1
Weekly downloads
Β 
Created
Source

🎧 AI Audiobook Maker (AIABM) v5.1.0

npm version License: MIT Node.js Version

Transform your PDFs and text files into high-quality audiobooks using OpenAI TTS (cloud) or Thorsten-Voice (native German). Choose between premium cloud voices or run everything locally at no cost!

πŸ†• New in v5.1: Beautiful UI/UX overhaul, flexible output options, optimized Thorsten-Voice loading, and Downloads folder as default output.

✨ Features

πŸŽ™οΈ Dual TTS Providers

  • ☁️ OpenAI TTS: Premium cloud voices with 6 voice options (requires API key)
  • πŸ‡©πŸ‡ͺ Thorsten-Voice: Native German TTS with authentic pronunciation (local/free)

πŸš€ Core Features

  • πŸš€ Zero Installation: Run directly with npx aiabm
  • πŸ“ Smart File Handling: Supports PDF and TXT files with drag & drop
  • 🎀 Voice Preview: Listen to voices before choosing (2 Thorsten + 6 OpenAI voices)
  • πŸ”’ Enhanced Security: Input sanitization, API key validation, and secure storage
  • πŸ§ͺ Comprehensive Testing: 55+ unit tests with 12.6% coverage and growing
  • ⏸️ Resume & Pause: Continue interrupted conversions anytime
  • πŸ” Secure API Key Management: Encrypted local storage
  • πŸ“Š Progress Tracking: Real-time conversion progress with estimates
  • πŸŽ›οΈ Advanced Controls: Adjust speed, quality, and output format
  • πŸ’° Cost Transparency: See exact pricing (OpenAI) or run free (local providers)
  • πŸ”§ Smart Installation: Automatic setup for local TTS providers

πŸš€ Quick Start

# Convert a specific file
npx aiabm mybook.pdf

# Interactive mode
npx aiabm

Method 2: Global Installation

npm install -g aiabm
aiabm mybook.pdf

πŸ“‹ Prerequisites

Required

  • Node.js 16+ (Download from nodejs.org)
  • FFmpeg (for audio combining - auto-installed on most systems)

Optional (Choose One or Both)

For OpenAI TTS:

For Thorsten-Voice (German TTS):

  • Python 3.9-3.11 (auto-installed)
  • Coqui TTS (auto-installed)
  • Completely FREE - runs locally

🎯 Usage Examples

CLI Mode

# Basic conversion
npx aiabm document.pdf

# With specific options (OpenAI)
npx aiabm book.txt --voice nova --speed 1.2 --model tts-1-hd

# Manage API key
npx aiabm --config

Interactive Mode

npx aiabm

Then follow the interactive prompts to:

  • Select TTS Provider (OpenAI, Fish Speech, or Thorsten-Voice)
  • Auto-install local providers if needed (one-time setup)
  • Select your file (browse, drag & drop, or enter path)
  • Preview and choose a voice
  • Configure settings (speed, quality, output format)
  • Monitor progress and resume if needed

🎀 Available Voices

πŸ€– OpenAI TTS (Cloud)

  • Alloy: Neutral, versatile
  • Echo: Clear, professional
  • Fable: Warm, storytelling
  • Onyx: Deep, authoritative
  • Nova: Bright, engaging
  • Shimmer: Gentle, soothing

🐟 Fish Speech (Local/Multilingual)

  • πŸ‡©πŸ‡ͺ German Female (Natural): High-quality German synthesis
  • πŸ‡©πŸ‡ͺ German Male (Clear): Professional German voice
  • πŸ‡©πŸ‡ͺ German Female (Expressive): Emotional German narration
  • πŸ‡ΊπŸ‡Έ English Female (Warm): Natural English voice
  • πŸ‡ΊπŸ‡Έ English Male (Professional): Business-quality English
  • πŸ‡ΊπŸ‡Έ English Female (Energetic): Dynamic storytelling
  • πŸ‡«πŸ‡· French Female (Elegant): Sophisticated French accent
  • πŸ‡«πŸ‡· French Male (Sophisticated): Professional French voice

πŸ‡©πŸ‡ͺ Thorsten-Voice (Native German)

  • πŸ‡©πŸ‡ͺ Thorsten (Authentic German Male): High-quality native German voice
  • πŸ‡©πŸ‡ͺ Thorsten Emotional (German Male): German voice with emotional expression

πŸ’° Pricing

OpenAI TTS

$0.015 per 1,000 characters

Content LengthEstimated CostExample
10,000 characters~$0.15Short article
50,000 characters~$0.75Small e-book
100,000 characters~$1.50Average novel
250,000 characters~$3.75Large book

Fish Speech & Thorsten-Voice

100% FREE - No API costs, runs entirely on your machine!

πŸ”§ Local TTS Setup

Both Fish Speech and Thorsten-Voice run entirely on your machine - no API costs! Now with fully automated installation!

npx aiabm
# Select "Fish Speech" or "Thorsten-Voice"
# Choose "Auto Install (recommended)"
# β†’ System automatically downloads and configures everything!

🐟 Fish Speech Setup

What happens automatically:

  • πŸ“¦ Repository Cloning - Downloads latest Fish Speech
  • 🐍 Virtual Environment - Creates isolated Python environment
  • ⚑ PyTorch Installation - Installs optimized CPU version
  • πŸ€– Model Download - Downloads Fish Speech 1.2 models (~1GB)
  • βœ… Dependency Check - Verifies installation works

System Requirements:

  • Python 3.8+ recommended
  • ~2GB disk space for models and dependencies
  • 4GB+ RAM recommended
  • CPU or GPU (GPU faster but optional)

πŸ‡©πŸ‡ͺ Thorsten-Voice Setup

What happens automatically:

  • 🐍 Compatible Python Detection - Finds Python 3.9-3.11
  • πŸ“¦ Virtual Environment - Creates isolated environment
  • 🎀 Coqui TTS Installation - Installs German TTS framework
  • πŸ€– Thorsten Model - Downloads German voice model (~500MB)
  • βœ… Compatibility Check - Verifies everything works

System Requirements:

  • Python 3.9-3.11 (NOT 3.12+, NOT 3.13+)
  • ~1GB disk space for models and dependencies
  • 2GB+ RAM recommended

Python Version Issues?

# Install compatible Python on macOS
brew install python@3.11

# On Ubuntu/Debian
sudo apt install python3.11 python3.11-venv

πŸ”§ Installation Status Tracking

  • βœ… Smart Detection: Avoids re-installation if already installed
  • πŸ“… Version Tracking: Shows installation date and version
  • πŸ”„ Update Suggestions: Recommends updates after 30+ days
  • πŸ› οΈ Installation Markers: Persistent installation state

πŸ”§ Advanced Features

Resume Interrupted Conversions

If conversion stops, simply run the tool again - it will automatically detect and offer to resume your previous session.

Multiple Output Formats

  • Single File: One complete audiobook MP3
  • Chapter Files: Separate MP3 per chunk
  • Both: Get both formats

Voice Preview Caching

Voice previews are cached locally to save API costs and improve performance.

Smart Text Chunking

  • Respects sentence boundaries
  • Preserves chapter structure for PDFs
  • Configurable chunk sizes (default: 4000 characters)

πŸ“‚ File Support

PDF Files

  • βœ… Up to 50MB
  • βœ… Text extraction with structure preservation
  • βœ… Automatic chapter detection

Text Files

  • βœ… Up to 1M characters
  • βœ… UTF-8 encoding
  • βœ… Automatic formatting cleanup

πŸ†• What's New in v5.0

πŸ”’ Enhanced Security

  • Input Sanitization: Prevents code injection and malicious input
  • API Key Validation: Comprehensive security checks for OpenAI keys
  • Secure Storage: Encrypted API key storage with multiple layers
  • Environment Assessment: Automatic security environment analysis

πŸ§ͺ Comprehensive Testing

  • 55+ Unit Tests: Extensive test coverage for core functionality
  • 12.6% Code Coverage: Growing test suite with focus on critical paths
  • Mocked Services: Fast, reliable tests without external dependencies
  • CI/CD Pipeline: Automated testing on every commit

πŸ›‘οΈ Better Error Handling

  • Type-Safe Validation: Zod schemas for all configuration and data
  • Graceful Failures: Better error messages and recovery mechanisms
  • Logging & Monitoring: Detailed error tracking and user feedback

🎯 Developer Experience

  • GitHub Actions: Automated CI/CD with security auditing
  • ESLint Clean: Zero linting errors with consistent code style
  • Documentation: Comprehensive inline documentation and examples

βš™οΈ Configuration

API Key Storage

Your OpenAI API key is encrypted and stored locally at:

  • macOS/Linux: ~/.config/ai-audiobook-maker/config.json
  • Windows: %APPDATA%\ai-audiobook-maker\config.json

Cache Location

Voice previews and temporary files:

  • macOS/Linux: ~/.config/ai-audiobook-maker/cache/
  • Windows: %APPDATA%\ai-audiobook-maker\cache\

Local TTS Installations

Local TTS providers are installed to:

  • Fish Speech: ~/.aiabm/fish-speech/
  • Thorsten-Voice: ~/.aiabm/thorsten-voice/

πŸ› οΈ Troubleshooting

Common Issues

"FFmpeg not found"

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

"API key invalid"

  • Verify your key at OpenAI Platform
  • Use npx aiabm --config to update your key

"File too large"

  • PDFs: Maximum 50MB
  • Text: Maximum 1M characters
  • Split large files before conversion

"Fish Speech dependencies missing"

  • Check Python version: python3 --version
  • Try restarting the app
  • Virtual environment issues usually resolve on restart

"Thorsten-Voice requires Python 3.9-3.11"

  • Install compatible Python: brew install python@3.11
  • App will automatically detect and use it
  • Creates separate virtual environment

Voice preview not playing

  • macOS: Uses built-in afplay
  • Windows: Uses PowerShell media player
  • Linux: Requires ffplay, mpv, vlc, or mplayer

Performance Tips

  • Use tts-1 model for faster processing
  • Use tts-1-hd for higher quality (slower)
  • Local TTS providers are free but slower than cloud
  • Cache clears automatically after 30 days
  • Resume feature prevents re-processing completed chunks

πŸ”’ Privacy & Security

  • API keys are encrypted locally using AES-192
  • No data is sent to servers when using local TTS
  • OpenAI TTS sends only text chunks to OpenAI servers
  • Cache files are stored locally only
  • Session data helps resume interrupted conversions
  • Local TTS models run entirely offline

πŸ“– Examples

Converting a PDF Book with German Voice

npx aiabm "Mein Roman.pdf"
# Select "Thorsten-Voice"
# Choose German voice
# Enjoy authentic German pronunciation!

Interactive Multilingual Setup

npx aiabm
# Select "Fish Speech"
# Auto-install if needed
# Preview German, English, and French voices
# Choose your favorite for the content language

Quick OpenAI Conversion

npx aiabm document.pdf --voice nova --speed 1.1

🀝 Contributing

Issues and feature requests welcome at: GitHub Issues

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

πŸ“ Changelog

v4.0.7 (2025-08-03) - 🐟 Fish Speech Fully Fixed & Operational

  • 🐟 Fish Speech 100% Working - Complete resolution of all Fish Speech TTS issues
  • πŸ”§ Fixed tokenizer.tiktoken - Proper base64 encoding of 32,000 tokens from Fish Speech
  • βš™οΈ Model Configuration Fixed - Created correct firefly_gan_vq.yaml matching model architecture
  • πŸ“ Dimension Mismatch Resolved - Fixed 512-dim vs 1024-dim PyTorch tensor issues
  • βœ… Parameter Validation Fixed - Corrected ServeTTSRequest use_memory_cache format
  • 🎯 End-to-End Functionality - Text-to-semantic and decoder models load perfectly
  • πŸš€ Full Service Availability - Fish Speech now detected as available and operational

v4.0.6 (2025-08-03) - πŸ§ͺ Comprehensive Test Coverage & TTS Fixes

  • πŸ§ͺ Major Test Coverage Improvement - 20% to 45.07% overall coverage (+125% improvement)
  • 🎯 AudiobookMaker.js Tests - 0% to 42.58% coverage with integration tests
  • πŸ” ConfigManager.js Tests - 0% to 98.03% coverage with security tests
  • πŸ“ FileHandler.js Tests - 0% to 72.99% coverage with core functionality tests
  • πŸ–₯️ cli.js Tests - 0% to 75.75% coverage with end-to-end tests
  • 🐟 Fish Speech Fixed - Installation detection and availability checking
  • πŸ‡©πŸ‡ͺ Thorsten Voice Fixed - Python 3.13 compatibility and installation issues
  • πŸ“Š 207 Total Tests - 195 passing with comprehensive edge case coverage
  • πŸ”§ Integration Tests - Real-world testing with actual TTS services and PDF processing
  • πŸ›‘οΈ Robust Error Handling - Enhanced service availability validation

v4.0.5 (2025-08-03) - 🎡 Unified Preview System

  • 🎡 Unified Preview Texts - Consistent voice previews across all TTS providers
  • 🌍 Language-Specific Previews - German, English, and French preview texts
  • πŸ’Ύ Smart Caching - Consistent cache filenames prevent preview regeneration
  • 🎯 Voice Language Detection - Automatic language detection from voice names
  • πŸ”„ Cache Optimization - Separate preview cache directories for each provider
  • βš™οΈ Better Performance - No more regenerating previews when switching providers

v4.0.4 (2025-08-03) - πŸ› οΈ Fish Speech Engine Fix

  • πŸ”§ Fixed TTSInferenceEngine initialization - Use proper ModelManager pattern
  • πŸ—οΈ Implemented correct model loading - Load LLaMA and DAC models separately
  • 🎯 Auto-device detection - Support for MPS (Apple Silicon), CUDA, and CPU
  • πŸ“¦ Better model management - Use launch_thread_safe_queue for text-to-semantic
  • πŸ”„ Improved generation flow - Proper model initialization before inference

v4.0.3 (2025-08-03) - πŸ”§ Fish Speech Import Fix

  • πŸ”§ Fixed MODDED_DAC import - Changed to correct DAC import from inference_engine
  • βœ… Added missing torch import - Fixed undefined torch reference in generation script
  • πŸ› οΈ Simplified dependency check - Import DAC directly from inference_engine
  • πŸ“¦ Better module verification - Check ServeTTSRequest schema availability

v4.0.2 (2025-08-03) - 🐟 Fish Speech API Update

  • πŸ”§ Fixed Fish Speech dependency check - Updated to use current DAC-based architecture
  • πŸ—‘οΈ Removed deprecated VQGAN imports - Fish Speech now uses DAC (Descript Audio Codec)
  • βœ… Updated generation script - Uses modern TTSInferenceEngine API
  • πŸ”„ Better installation handling - Auto-removes incomplete installations
  • πŸ“¦ Improved pip install - Installs Fish Speech package in development mode
  • πŸ› οΈ Enhanced error reporting - More detailed debugging information

v4.0.1 (2025-08-02) - πŸ”§ Installation & Compatibility Fixes

  • πŸ”§ Fixed Fish Speech virtual environment usage - Proper dependency checking
  • 🐍 Enhanced Python version detection - Blocks Thorsten-Voice on Python 3.13+
  • βœ… Smart installation status tracking - Avoids unnecessary re-installations
  • πŸ“… Installation markers - Persistent installation state with version info
  • πŸ”„ Better error handling - More informative error messages and recovery
  • πŸ’‘ Improved user guidance - Clear instructions for Python compatibility issues

v4.0.0 (2025-08-02) - 🌟 Major Refactoring

  • πŸ—‘οΈ REMOVED: Kyutai TTS (replaced due to Python 3.13 compatibility issues)
  • 🐟 NEW: Fish Speech integration - State-of-the-art multilingual TTS
  • πŸ‡©πŸ‡ͺ NEW: Thorsten-Voice integration - Native German TTS
  • 🎀 Enhanced Voice Selection: 16 total voices across 3 providers
  • πŸ—οΈ Automated Installation: One-click setup for local TTS providers
  • πŸ”§ Improved Architecture: Better service abstraction and error handling
  • πŸ“Š Enhanced Testing: 80%+ test coverage with Jest
  • πŸ› οΈ Code Quality Tools: ESLint, Prettier, Snyk integration
  • πŸ”„ Backward Compatibility: 100% compatibility with existing OpenAI workflows

v3.3.0 (2025-08-01) - πŸš€ Kyutai Integration (Deprecated)

  • πŸ†“ Kyutai TTS integration (now removed in v4.0.0)
  • πŸ—οΈ Automated installation system
  • 🎀 15+ voice options
  • πŸ”„ Provider selection system

Happy listening! 🎧 Turn any text into your personal audiobook library with the best TTS technology available.

Keywords

audiobook

FAQs

Package last updated on 07 Aug 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚑️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.