
Security News
Axios Supply Chain Attack Reaches OpenAI macOS Signing Pipeline, Forces Certificate Rotation
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.
@qvac/transcription-whispercpp
Advanced tools
This library simplifies running inference with the Whisper transcription model within QVAC runtime applications. It provides an easy interface to load, execute, and manage Whisper inference instances, supporting multiple data sources (data loaders).
Note: This library now uses whisper.cpp for improved performance and compatibility. The previous MLC-based implementation has been replaced.
| Platform | Architecture | Min Version | Status | GPU Support |
|---|---|---|---|---|
| macOS | arm64, x64 | 14.0+ | ✅ Tier 1 | Metal |
| iOS | arm64 | 17.0+ | ✅ Tier 1 | Metal |
| Linux | arm64, x64 | Ubuntu-22+ | ✅ Tier 1 | Vulkan |
| Android | arm64 | 12+ | ✅ Tier 1 | Vulkan |
| Windows | x64 | 10+ | ✅ Tier 1 | Vulkan |
Dependencies:
Make sure Bare Runtime is installed:
npm install -g bare bare-make
Note : Make sure the Bare version is >= 1.24.2. Check this using :
bare -v
Install the latest development version (adjust package name based on desired model/quantization):
npm install @qvac/transcription-whispercpp@latest
For local development, you'll need to build the native addon that interfaces with the Whisper model. Follow these steps:
First, make sure you have the prerequisites installed as described in the Installation section.
Supported Platforms:
All Platforms:
This project uses vcpkg for C++ dependency management. You need to install and configure vcpkg before building:
1. Install vcpkg:
# Clone vcpkg
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
# Bootstrap vcpkg
# On Linux/macOS:
./bootstrap-vcpkg.sh
# On Windows:
.\bootstrap-vcpkg.bat
2. Set Environment Variable:
# Linux/macOS (add to your shell profile):
export VCPKG_ROOT=/path/to/vcpkg
# Windows (PowerShell):
$env:VCPKG_ROOT = "C:\path\to\vcpkg"
# Or set permanently via System Properties > Environment Variables
3. Integrate vcpkg (optional but recommended):
# This makes vcpkg available to all CMake projects
./vcpkg integrate install
Linux:
# Ubuntu/Debian
sudo apt update
sudo apt install build-essential cmake git pkg-config
# CentOS/RHEL/Fedora
sudo yum groupinstall "Development Tools"
sudo yum install cmake git pkgconfig
# or for newer versions:
sudo dnf groupinstall "Development Tools"
sudo dnf install cmake git pkgconfig
macOS:
# Install Xcode Command Line Tools
xcode-select --install
# Using Homebrew (recommended)
brew install cmake git
# Using MacPorts
sudo port install cmake git
Windows:
Metal (macOS):
Vulkan (Linux/Windows):
Linux Vulkan Setup:
# Ubuntu/Debian
sudo apt install vulkan-tools libvulkan-dev vulkan-utility-libraries-dev spirv-tools
# CentOS/RHEL/Fedora
sudo yum install vulkan-tools vulkan-devel vulkan-validation-layers-devel spirv-tools
Windows Vulkan Setup:
git clone https://github.com/tetherto/qvac.git
cd qvac/packages/qvac-lib-infer-whispercpp
# Initialize submodules (required for native dependencies)
git submodule update --init --recursive
# Install dependencies
npm install
# Build the native addon
npm run build
This command runs the complete build sequence:
bare-make generate - Generates build configurationbare-make build - Compiles the native C++ addonbare-make install - Installs the built addonAfter building, you can run the test suite:
# Run all tests (lint + unit + integration)
npm test
# Or run tests individually
npm run test:unit
npm run test:integration
Integration tests cover both the chunked reload flow (test/integration/audio-ctx-chunking.test.js) and the live streaming flow (test/integration/live-stream-simulation.test.js), so running them is the quickest way to verify those end-to-end scenarios after changes.
For ongoing development, the typical workflow is:
npm install && npm run build && npm run test:integration
The library provides a straightforward workflow for audio transcription:
Heads up: the package is intended to be used through
index.js’sTranscriptionWhispercppclass. Advanced sections below document the native addon for completeness, but you rarely need them when integrating the published npm package.
Data loaders abstract the way model files are accessed, whether from the filesystem, a network drive, or any other storage mechanism. More info about model registry and model builds in resources.
First, select and instantiate a data loader that provides access to model files:
const FilesystemDL = require('@qvac/dl-filesystem')
const fsDL = new FilesystemDL({
dirPath: './path/to/model/files' // Directory containing model weights and settings
})
Most users interact with the addon exclusively through index.js. From that entrypoint we surface a small, safe subset of options; everything else keeps whisper.cpp defaults.
| Section | Key | Description |
|---|---|---|
contextParams | model | Absolute or relative path to the .bin whisper model |
| (all other context keys keep their defaults because changing them forces a full reload, see below) | ||
whisperConfig | (any whisper_full_params key) | Forwarded untouched. We surface convenience defaults in index.js, but every whisper.cpp flag is accepted—see Advanced configuration. |
miscConfig | caption_enabled | Formats segments with <|start|>..<|end|> markers |
Internally WhisperModel::configContextIsChanged() watches model, use_gpu, flash_attn and gpu_device. If any of these change we must:
unload() (destroys the current whisper_context and whisper_state).whisper_init_from_file_with_params.Depending on model size this can take several seconds. Everything else in whisperConfig—language, temperatures, VAD settings, etc.—is applied in place and does not trigger a reload. If you are seeing unexpected pauses, double-check that you are not mutating these four context keys between jobs.
Need more than the handful of options exposed in index.js? The upstream whisper.cpp documentation lists every flag available through whisper_full_params. Rather than duplicating that matrix here, refer to:
whisper_full_paramsexamples/example.audio-ctx-chunking.js (shows offset_ms, duration_ms, audio_ctx, and reload loops)examples/example.live-transcription.js (shows streaming chunks into a single job)Those scripts stay in sync with the codebase and are the best place to copy from when you need the raw addon surface.
Quick JS-level configuration (what you typically pass to new TranscriptionWhispercpp(...)):
const config = {
contextParams: {
model: './models/ggml-tiny.bin'
},
whisperConfig: {
language: 'en',
duration_ms: 0,
temperature: 0.0,
suppress_nst: true,
n_threads: 0,
vad_model_path: './models/ggml-silero-v5.1.2.bin',
vadParams: {
threshold: 0.6,
min_speech_duration_ms: 250,
min_silence_duration_ms: 200
}
},
miscConfig: {
caption_enabled: false
}
}
Between this minimal configuration and the example scripts you should have everything needed, whether you are wiring the addon by hand or just instantiating TranscriptionWhispercpp.
Available Whisper Models (from HuggingFace):
| Model | Size | Description |
|---|---|---|
ggml-tiny.bin | 78 MB | Smallest, fastest |
ggml-base.bin | 148 MB | Balanced size/accuracy |
ggml-small.bin | 488 MB | Better accuracy |
ggml-medium.bin | 1.5 GB | High accuracy |
ggml-large-v3.bin | 3.1 GB | Best accuracy |
ggml-large-v3-turbo.bin | 1.6 GB | Best accuracy, faster |
Quantized variants (q8_0, q5_1, q5_0) are also available for all sizes.
VAD Model (from HuggingFace):
| Model | Size | Description |
|---|---|---|
ggml-silero-v5.1.2.bin | 885 KB | Silero VAD for voice activity detection |
Use the provided script to download models from HuggingFace:
npm run download-models
Or download manually with curl:
# Whisper model
curl -L -o models/ggml-tiny.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin
# VAD model
curl -L -o models/ggml-silero-v5.1.2.bin https://huggingface.co/ggml-org/whisper-vad/resolve/main/ggml-silero-v5.1.2.bin
Import the specific Whisper model class based on the installed package and instantiate it:
const TranscriptionWhispercpp = require('@qvac/transcription-whispercpp')
const model = new TranscriptionWhispercpp(args, config)
Note : This import changes depending on the package installed.
Load the model weights and initialize the inference engine. Optionally provide a callback for progress updates:
try {
// Basic usage
await model.load()
// Advanced usage with progress tracking
await model.load(
false, // Don't close loader after loading
(progress) => console.log(`Loading: ${progress.overallProgress}% complete`)
)
} catch (error) {
console.error('Failed to load model:', error)
}
Progress Callback Data
The progress callback receives an object with the following properties:
| Property | Type | Description |
|---|---|---|
action | string | Current operation being performed |
totalSize | number | Total bytes to be loaded |
totalFiles | number | Total number of files to process |
filesProcessed | number | Number of files completed so far |
currentFile | string | Name of file currently being processed |
currentFileProgress | string | Percentage progress on current file |
overallProgress | string | Overall loading progress percentage |
Pass an audio stream (e.g., from bare-fs.createReadStream) to the run method. Process the transcription results asynchronously.
There are two ways to receive transcription results:
onUpdate()The onUpdate() callback receives each transcription segment in real-time as whisper.cpp generates them during processing. This is ideal for live transcription display or progressive updates.
try {
const audioStream = fs.createReadStream('path/to/your/audio.ogg', {
highWaterMark: 16000 // Adjust based on bitrate (e.g., 128000 / 8)
})
const response = await model.run(audioStream)
// Receive segments as they are transcribed (real-time streaming)
await response
.onUpdate(segment => {
console.log('New segment transcribed:', segment)
// Each segment arrives immediately after whisper.cpp processes it
})
.await() // Wait for transcription to complete
console.log('Transcription finished!')
} catch (error) {
console.error('Transcription failed:', error)
}
iterate()The iterate() method returns all transcription segments after the entire transcription completes. This is useful when you need the full result before processing.
try {
const audioStream = fs.createReadStream('path/to/your/audio.ogg', {
highWaterMark: 16000
})
const response = await model.run(audioStream)
// Wait for complete transcription, then iterate over all segments
for await (const transcriptionChunk of response.iterate()) {
console.log('Transcription chunk:', transcriptionChunk)
}
console.log('Transcription finished!')
} catch (error) {
console.error('Transcription failed:', error)
}
Key Differences:
onUpdate(): Real-time streaming - segments arrive as they are generated by whisper.cpp's new_segment_callbackiterate(): Batch processing - all segments available after transcription completesexamples/example.audio-ctx-chunking.js shows the production pattern: reuse a model instance, call reload() with { offset_ms, duration_ms, audio_ctx } per chunk (first chunk uses audio_ctx = 0, subsequent ones clamp to ~1500), then run the full audio stream. The matching integration test (test/integration/audio-ctx-chunking.test.js) exercises exactly the same flow.
examples/example.live-transcription.js feeds tiny PCM buffers into a pushable Readable, keeps a single model.run(...) open, and relies on onUpdate() for incremental text. test/integration/live-stream-simulation.test.js covers both the streaming case and a segmented loop without any reload() calls.
Always unload the model when finished to free up memory and resources:
try {
await model.unload()
} catch (error) {
console.error('Failed to unload model:', error)
}
git clone https://github.com/tetherto/qvac-lib-infer-whispercpp.git
cd qvac-lib-infer-whispercpp
npm install
examples folderbare examples/quickstart.js
See examples/quickstart.js for the full workflow (TranscriptionWhispercpp + filesystem loader), including streaming audio and cleanup.
We conduct comprehensive benchmarking of our Whisper transcription models to evaluate their performance across different audio conditions and metrics. The evaluations are performed using the LibriSpeech dataset, a standard benchmark for speech recognition tasks.
Our benchmarking suite measures transcription accuracy using Word Error Rate (WER) and Character Error Rate (CER), along with performance metrics such as model load times and inference speeds.
For detailed benchmark results across all supported audio conditions and model configurations, see our Benchmark Results Summary.
The benchmarking covers:
Results are updated regularly as new model versions are released.
offset_ms, duration_ms, and audio_ctx per chunk (mirrors the audio-ctx-chunking integration test).live-stream-simulation integration test).• Bare – Small and modular JavaScript runtime for desktop and mobile. Learn more.
• QVAC – QVAC is our open-source AI-SDK for building decentralized AI applications.
• Corestore – Corestore is a Hypercore factory that makes it easier to manage large collections of named Hypercores. Learn more.
All the errors from this library are in the range of 6001-7000
This project is licensed under the Apache-2.0 License – see the LICENSE file for details.
For questions or issues, please open an issue on the GitHub repository.
FAQs
transcription addon for qvac
We found that @qvac/transcription-whispercpp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.

Security News
Open source is under attack because of how much value it creates. It has been the foundation of every major software innovation for the last three decades. This is not the time to walk away from it.

Security News
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.