Socket
Book a DemoInstallSign in
Socket

whisper-parallel-cpu

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

whisper-parallel-cpu

High-performance audio and video transcription using whisper.cpp with automatic model downloading and CPU parallelism

1.2.3
pipPyPI
Maintainers
1

Whisper Parallel CPU Audio & Video Transcriber

A minimal, robust Python package for whisper.cpp with CPU-optimized threading and integrated model management. Transcribe both audio and video files with high performance. Targeting distributed cloud deployments and transcription workflows.

πŸš€ Quick Start

Install from PyPI:

pip install whisper-parallel-cpu

Use in Python:

import whisper_parallel_cpu

# Transcribe audio files
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base")

# Transcribe video files
text = whisper_parallel_cpu.transcribe("video.mp4", model="base")

# Or use specific functions
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="small")
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium")

Or use the CLI:

# Transcribe audio
whisper_parallel_cpu transcribe audio.mp3 --model base

# Transcribe video
whisper_parallel_cpu transcribe video.mp4 --model base

✨ Features

  • Native C++/pybind11 speed (CPU & GPU acceleration)
  • Automatic model download/caching - no manual setup required
  • Simple Python & CLI interface - just pip install and go
  • Input: Audio (.mp3, .wav, .flac, .aac, .ogg, .m4a) and video (.mp4, .mkv, .avi, .mov) formats
  • Output: Transcribed text as a Python string
  • Benchmarking: Built-in performance testing and optimization tools
  • Cross-platform: Works on macOS, Linux, and Windows

πŸ“¦ Installation

pip install whisper-parallel-cpu

From Source (Development)

# Clone the repository
git clone https://github.com/krisfur/whisper-parallel-cpu.git
cd whisper-parallel-cpu

# Install in editable mode
pip install -e .

# Test the installation
python test_transcribe.py video.mp4

🧰 Requirements

System Tools

  • C++17 compiler (g++, clang++) - automatically handled by pip
  • cmake (>=3.15) - automatically handled by pip
  • ffmpeg (for audio extraction)

Install ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use Chocolatey:

choco install ffmpeg

πŸ§ͺ Usage

Python API

Basic Usage (Function-based)

import whisper_parallel_cpu

# Transcribe any audio or video file (auto-detects format)
text = whisper_parallel_cpu.transcribe("audio.mp3", model="base", threads=4)
text = whisper_parallel_cpu.transcribe("video.mp4", model="small")

# Use specific functions for audio or video
text = whisper_parallel_cpu.transcribe_audio("audio.wav", model="base", threads=4)
text = whisper_parallel_cpu.transcribe_video("video.mkv", model="medium", threads=8)

# CPU-only mode (no GPU)
text = whisper_parallel_cpu.transcribe("audio.flac", model="base", use_gpu=False)

Advanced Usage (Model Reuse)

For better performance when transcribing multiple files, use the WhisperModel class to load the model once and reuse it:

from whisper_parallel_cpu import WhisperModel

# Create a model instance (model is loaded on first use)
model = WhisperModel(model="base", use_gpu=False, threads=4)

# Transcribe multiple files using the same loaded model
files = ["audio1.mp3", "audio2.wav", "video1.mp4", "video2.mkv"]
for file_path in files:
    text = model.transcribe(file_path)
    print(f"Transcribed {file_path}: {text[:100]}...")

# Use as context manager
with WhisperModel(model="small", use_gpu=True) as model:
    text1 = model.transcribe("audio1.mp3")
    text2 = model.transcribe("audio2.wav")
    # Model is automatically managed

# Memory management
model.clear_contexts()  # Free memory
print(f"Active contexts: {model.get_context_count()}")

Supported File Formats

Audio Formats:

  • .mp3, .wav, .flac, .aac, .ogg, .m4a, .wma, .opus, .webm, .3gp, .amr, .au, .ra, .mid, .midi

Video Formats:

  • .mp4, .avi, .mov, .mkv, .wmv, .flv, .webm, .m4v, .3gp, .ogv, .ts, .mts, .m2ts

Available Models

The following models are available and will be downloaded automatically:

ModelSizeAccuracySpeedUse Case
tiny74MBGoodFastestQuick transcriptions
base141MBBetterFastGeneral purpose
small444MBBetterMediumHigh accuracy needed
medium1.4GBBestSlowMaximum accuracy
large2.9GBBestSlowestProfessional use

Command Line Interface

# List available models
whisper_parallel_cpu list

# Download a specific model
whisper_parallel_cpu download base

# Transcribe audio files
whisper_parallel_cpu transcribe audio.mp3 --model base --threads 4
whisper_parallel_cpu transcribe audio.wav --model small

# Transcribe video files
whisper_parallel_cpu transcribe video.mp4 --model base --threads 4
whisper_parallel_cpu transcribe video.mkv --model medium

# Transcribe without GPU (CPU-only)
whisper_parallel_cpu transcribe audio.flac --model small --no-gpu

Model Management

import whisper_parallel_cpu

# List available models
whisper_parallel_cpu.list_models()

# Download a specific model
whisper_parallel_cpu.download_model("medium")

# Force re-download
whisper_parallel_cpu.download_model("base", force=True)

πŸ“Š Benchmarking & Performance

Run Performance Tests

# Test with 5 audio/video copies
python benchmark.py audio.mp3 5
python benchmark.py video.mp4 5

What the Benchmark Tests

  • Thread Scaling: Tests different thread counts (1, 2, 4, 8, 16, etc.) for single audio/video transcription
  • Sequential Processing: Measures throughput when processing multiple audio/video files one after another
  • Parallel Processing: Tests concurrent processing with different numbers of workers
  • Optimal Configuration: Provides the best settings for your specific hardware

Performance Optimization Tips

  • Model Reuse: Use WhisperModel class for multiple transcriptions to avoid reloading the model each time
  • GPU Acceleration: The system automatically uses Metal (macOS) or CUDA (Linux/Windows) when available
  • Thread Count: Use the benchmark to find optimal thread count for your CPU
  • Batch Processing: For multiple audio/video files, use parallel processing with ThreadPoolExecutor
  • Model Size: Smaller models (base, small) are faster but less accurate than larger ones (medium, large)

Model Reuse Performance

When transcribing multiple files, using the WhisperModel class can provide significant performance improvements:

from whisper_parallel_cpu import WhisperModel
import time

# Method 1: Using WhisperModel (model reuse) - FASTER
model = WhisperModel(model="base")
start = time.time()
for file in files:
    text = model.transcribe(file)
model_time = time.time() - start

# Method 2: Using transcribe function (no reuse) - SLOWER
start = time.time()
for file in files:
    text = whisper_parallel_cpu.transcribe(file, model="base")
function_time = time.time() - start

print(f"Speedup with model reuse: {function_time / model_time:.2f}x")

Typical speedups:

  • 2-5x faster for multiple files with the same model
  • Reduced memory usage through context sharing
  • Better for batch processing workflows

βš™οΈ API Reference

transcribe(file_path, model, threads, use_gpu)

Transcribes an audio or video file using Whisper. Automatically detects file type.

Parameters:

  • file_path (str): Path to the audio or video file
  • model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
  • threads (int): Number of CPU threads to use (default: 4)
  • use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

  • str: Transcribed text

transcribe_audio(audio_path, model, threads, use_gpu)

Transcribes an audio file using Whisper.

Parameters:

  • audio_path (str): Path to the audio file
  • model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
  • threads (int): Number of CPU threads to use (default: 4)
  • use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

  • str: Transcribed text

transcribe_video(video_path, model, threads, use_gpu)

Transcribes a video file using Whisper.

Parameters:

  • video_path (str): Path to the video file
  • model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
  • threads (int): Number of CPU threads to use (default: 4)
  • use_gpu (bool): Whether to use GPU acceleration (default: True)

Returns:

  • str: Transcribed text

Example:

import whisper_parallel_cpu

# Basic usage
text = whisper_parallel_cpu.transcribe_video("sample.mp4")

# Advanced usage
text = whisper_parallel_cpu.transcribe_video(
    "sample.mp4", 
    model="medium", 
    threads=8, 
    use_gpu=False
)

WhisperModel(model, use_gpu, threads)

A class for efficient model reuse across multiple transcriptions.

Parameters:

  • model (str): Model name (e.g. "base", "tiny", etc.) or path to Whisper model binary (.bin file)
  • use_gpu (bool): Whether to use GPU acceleration (default: False)
  • threads (int): Number of CPU threads to use (default: 4)

Methods:

  • transcribe(file_path): Transcribe any audio or video file
  • transcribe_audio(audio_path): Transcribe an audio file
  • transcribe_video(video_path): Transcribe a video file
  • clear_contexts(): Clear all cached contexts to free memory
  • get_context_count(): Get number of cached contexts

Example:

from whisper_parallel_cpu import WhisperModel

# Create model instance
model = WhisperModel(model="base", use_gpu=False, threads=4)

# Transcribe multiple files efficiently
files = ["audio1.mp3", "audio2.wav", "video1.mp4"]
for file_path in files:
    text = model.transcribe(file_path)
    print(f"Transcribed: {text[:50]}...")

# Memory management
model.clear_contexts()

clear_contexts()

Clear all cached whisper contexts to free memory.

Example:

import whisper_parallel_cpu

# Clear all cached contexts
whisper_parallel_cpu.clear_contexts()

get_context_count()

Get the number of currently cached whisper contexts.

Returns:

  • int: Number of cached contexts

Example:

import whisper_parallel_cpu

# Check how many contexts are cached
count = whisper_parallel_cpu.get_context_count()
print(f"Active contexts: {count}")

🀝 Contributing

  • Fork the repository
  • Create a feature branch: git checkout -b feature-name
  • Make your changes and test thoroughly
  • Commit your changes: git commit -m 'Add feature'
  • Push to the branch: git push origin feature-name
  • Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • Built on whisper.cpp by Georgi Gerganov
  • Uses pybind11 for Python bindings
  • Model management inspired by the original OpenAI Whisper project

Keywords

whisper

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚑️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.