
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
whisper-cpp-node
Advanced tools
Node.js bindings for whisper.cpp - fast speech-to-text with GPU acceleration
Node.js bindings for whisper.cpp - fast speech-to-text with GPU acceleration.
macOS:
Windows:
npm install whisper-cpp-node
# or
pnpm add whisper-cpp-node
The platform-specific binary is automatically installed:
@whisper-cpp-node/darwin-arm64@whisper-cpp-node/win32-x64import {
createWhisperContext,
transcribeAsync,
} from "whisper-cpp-node";
// Create a context with your model
const ctx = createWhisperContext({
model: "./models/ggml-base.en.bin",
use_gpu: true,
});
// Transcribe audio file
const result = await transcribeAsync(ctx, {
fname_inp: "./audio.wav",
language: "en",
});
// Result: { segments: [["00:00:00,000", "00:00:02,500", " Hello world"], ...] }
for (const [start, end, text] of result.segments) {
console.log(`[${start} --> ${end}]${text}`);
}
// Clean up
ctx.free();
import {
createWhisperContext,
transcribeAsync,
} from "whisper-cpp-node";
const ctx = createWhisperContext({
model: "./models/ggml-base.en.bin",
use_gpu: true,
});
// Pass raw PCM audio (16kHz, mono, float32)
const pcmData = new Float32Array(/* your audio samples */);
const result = await transcribeAsync(ctx, {
pcmf32: pcmData,
language: "en",
});
for (const [start, end, text] of result.segments) {
console.log(`[${start} --> ${end}]${text}`);
}
ctx.free();
Get real-time output as audio is processed. The on_new_segment callback fires for each segment as it's generated, while the final callback still receives all segments at completion (backward compatible):
import { createWhisperContext, transcribe } from "whisper-cpp-node";
const ctx = createWhisperContext({
model: "./models/ggml-base.en.bin",
});
transcribe(ctx, {
fname_inp: "./long-audio.wav",
language: "en",
// Called for each segment as it's generated
on_new_segment: (segment) => {
console.log(`[${segment.start}]${segment.text}`);
},
}, (err, result) => {
// Final callback still receives ALL segments at completion
console.log(`Done! ${result.segments.length} segments`);
ctx.free();
});
createWhisperContext(options)Create a persistent context for transcription.
interface WhisperContextOptions {
model: string; // Path to GGML model file (required)
use_gpu?: boolean; // Enable GPU acceleration (default: true)
// Uses Metal on macOS, Vulkan on Windows
use_coreml?: boolean; // Enable Core ML on macOS (default: false)
use_openvino?: boolean; // Enable OpenVINO encoder on Intel (default: false)
openvino_device?: string; // OpenVINO device: 'CPU', 'GPU', 'NPU' (default: 'CPU')
openvino_model_path?: string; // Path to OpenVINO encoder model (auto-derived)
openvino_cache_dir?: string; // Cache dir for compiled OpenVINO models
flash_attn?: boolean; // Enable Flash Attention (default: false)
gpu_device?: number; // GPU device index (default: 0)
dtw?: string; // DTW preset for word timestamps
no_prints?: boolean; // Suppress log output (default: false)
}
transcribeAsync(context, options)Transcribe audio (Promise-based). Accepts either a file path or PCM buffer.
// File input
interface TranscribeOptionsFile {
fname_inp: string; // Path to audio file
// ... common options
}
// Buffer input
interface TranscribeOptionsBuffer {
pcmf32: Float32Array; // Raw PCM (16kHz, mono, float32, -1.0 to 1.0)
// ... common options
}
// Common options (partial list - see types.ts for full options)
interface TranscribeOptionsBase {
// Language
language?: string; // Language code ('en', 'zh', 'auto')
translate?: boolean; // Translate to English
detect_language?: boolean; // Auto-detect language
// Threading
n_threads?: number; // CPU threads (default: 4)
n_processors?: number; // Parallel processors
// Audio processing
offset_ms?: number; // Start offset in ms
duration_ms?: number; // Duration to process (0 = all)
// Output control
no_timestamps?: boolean; // Disable timestamps
max_len?: number; // Max segment length (chars)
max_tokens?: number; // Max tokens per segment
split_on_word?: boolean; // Split on word boundaries
token_timestamps?: boolean; // Include token-level timestamps
// Sampling
temperature?: number; // Sampling temperature (0.0 = greedy)
beam_size?: number; // Beam search size (-1 = greedy)
best_of?: number; // Best-of-N sampling
// Thresholds
entropy_thold?: number; // Entropy threshold
logprob_thold?: number; // Log probability threshold
no_speech_thold?: number; // No-speech probability threshold
// Context
prompt?: string; // Initial prompt text
no_context?: boolean; // Don't use previous context
// VAD preprocessing
vad?: boolean; // Enable VAD preprocessing
vad_model?: string; // Path to VAD model
vad_threshold?: number; // VAD threshold (0.0-1.0)
vad_min_speech_duration_ms?: number;
vad_min_silence_duration_ms?: number;
vad_speech_pad_ms?: number;
// Callbacks
progress_callback?: (progress: number) => void;
on_new_segment?: (segment: StreamingSegment) => void; // Streaming callback
}
// Streaming segment (passed to on_new_segment callback)
interface StreamingSegment {
start: string; // Start timestamp "HH:MM:SS,mmm"
end: string; // End timestamp
text: string; // Transcribed text
segment_index: number; // 0-based index
is_partial: boolean; // Reserved for future use
tokens?: StreamingToken[]; // Only if token_timestamps enabled
}
// Result
interface TranscribeResult {
segments: TranscriptSegment[];
}
// Segment is a tuple: [start, end, text]
type TranscriptSegment = [string, string, string];
// Example: ["00:00:00,000", "00:00:02,500", " Hello world"]
createVadContext(options)Create a voice activity detection context for streaming audio.
interface VadContextOptions {
model: string; // Path to Silero VAD model
threshold?: number; // Speech threshold (default: 0.5)
n_threads?: number; // Number of threads (default: 1)
no_prints?: boolean; // Suppress log output
}
interface VadContext {
getWindowSamples(): number; // Returns 512 (32ms at 16kHz)
getSampleRate(): number; // Returns 16000
process(samples: Float32Array): number; // Returns probability 0.0-1.0
reset(): void; // Reset LSTM state
free(): void; // Release resources
}
import { createVadContext } from "whisper-cpp-node";
const vad = createVadContext({
model: "./models/ggml-silero-v6.2.0.bin",
threshold: 0.5,
});
const windowSize = vad.getWindowSamples(); // 512 samples
// Process audio in 32ms chunks
function processAudioChunk(samples: Float32Array) {
const probability = vad.process(samples);
if (probability >= 0.5) {
console.log("Speech detected!", probability);
}
}
// Reset when starting new audio stream
vad.reset();
// Clean up when done
vad.free();
For 3x+ faster encoding on Apple Silicon:
Generate a Core ML model:
pip install ane_transformers openai-whisper coremltools
./models/generate-coreml-model.sh base.en
Place it next to your GGML model:
models/ggml-base.en.bin
models/ggml-base.en-encoder.mlmodelc/
Enable Core ML:
const ctx = createWhisperContext({
model: "./models/ggml-base.en.bin",
use_coreml: true,
});
For faster encoder inference on Intel CPUs and GPUs (requires build with OpenVINO support):
Install OpenVINO and convert the model:
pip install openvino openvino-dev
python models/convert-whisper-to-openvino.py --model base.en
The OpenVINO model files are placed next to your GGML model:
models/ggml-base.en.bin
models/ggml-base.en-encoder-openvino.xml
models/ggml-base.en-encoder-openvino.bin
Enable OpenVINO:
const ctx = createWhisperContext({
model: "./models/ggml-base.en.bin",
use_openvino: true,
openvino_device: "CPU", // or "GPU" for Intel iGPU
openvino_cache_dir: "./openvino_cache", // optional, speeds up init
});
Note: OpenVINO support requires the addon to be built with -DADDON_OPENVINO=ON.
Download models from Hugging Face:
# Base English model (~150MB)
curl -L -o models/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
# Large v3 Turbo quantized (~500MB)
curl -L -o models/ggml-large-v3-turbo-q4_0.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q4_0.bin
# Silero VAD model (for streaming VAD)
curl -L -o models/ggml-silero-v6.2.0.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-silero-v6.2.0.bin
MIT
FAQs
Node.js bindings for whisper.cpp - fast speech-to-text with GPU acceleration
The npm package whisper-cpp-node receives a total of 38 weekly downloads. As such, whisper-cpp-node popularity was classified as not popular.
We found that whisper-cpp-node demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.