🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Book a Demo Install Sign in

whisper-node-server

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

whisper-node-server

Local audio transcription on CPU. Node.js bindings for OpenAI's Whisper. Modified from node-whisper

1.0.0

latest

Source

npm

Version published: 7 months ago

Maintainers: 0

Created: 7 months ago

Source

whisper-node-server

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

Output transcripts to JSON (also .txt .srt .vtt)
Optimized for CPU (Including Apple Silicon ARM)
Timestamp precision to single word
Server mode with automatic audio conversion
Optional CUDA support for GPU acceleration

Installation

Add dependency to project

npm install whisper-node-server

Download whisper model of choice [OPTIONAL]

npx whisper-node-server download

Build whisper.cpp

Windows

use w64devkit and cmake

Usage

Direct Usage

import whisper from 'whisper-node-server';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Server Mode

Set up environment variables:

WHISPER_MODEL=base.en
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1

Create the server:

import express from 'express';
import multer from 'multer';
import whisper from 'whisper-node-server';
import { exec } from 'child_process';
import { promisify } from 'util';
import fs from 'fs';

const app = express();
const upload = multer({ dest: 'uploads/' });
const execPromise = promisify(exec);

// Transcribe endpoint
app.post('/transcribe', upload.single('audio'), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).send('No audio file uploaded');
    }

    const inputPath = req.file.path;
    const outputPath = inputPath.replace(/\.wav$/, '_converted.wav');

    // Convert audio to configured sample rate using FFmpeg
    await execPromise(`ffmpeg -y -i "${inputPath}" -ar ${process.env.AUDIO_SAMPLE_RATE} -ac ${process.env.AUDIO_CHANNELS} -c:a pcm_s16le "${outputPath}"`);

    // Transcribe the audio
    const options = {
      modelName: process.env.WHISPER_MODEL,
      whisperOptions: {
        language: 'auto',
        word_timestamps: true
      }
    };

    const transcript = await whisper(outputPath, options);

    // Clean up temp files
    fs.unlinkSync(inputPath);
    fs.unlinkSync(outputPath);

    // Extract speech text
    const text = transcript ? (Array.isArray(transcript) ? 
      transcript.map(t => t.speech).join(' ') : 
      transcript.toString()) : '';
      
    res.json({ text });

  } catch (error) {
    console.error('Transcription error:', error);
    res.status(500).send('Error processing audio: ' + error.message);
  }
});

app.listen(8080, () => {
  console.log('Server running on port 8080');
});

Send audio for transcription:

// Convert your audio to a blob
const wavBlob = await float32ArrayToWav(audio);
const formData = new FormData();
formData.append('audio', wavBlob, 'recording.wav');

// Send to server
const response = await fetch('http://localhost:8080/transcribe', {
  method: 'POST',
  body: formData,
});

if (!response.ok) {
  throw new Error('Transcription failed');
}

const data = await response.json();
console.log('Transcription:', data.text);

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Full Options List

import whisper from 'whisper-node-server';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Modifying whisper-node-server

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Acknowledgements

Keywords

FAQs

What is whisper-node-server?

Is whisper-node-server well maintained?

Package last updated on 14 Dec 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

whisper-node-server

whisper-node-server

Features

Installation

Windows

Usage

Direct Usage

Server Mode

Output (JSON)

Full Options List

Input File Format

Made with

Modifying whisper-node-server

Acknowledgements

Keywords

Related posts

Node.js Homepage Adds Paid Support Link, Prompting Contributor Pushback

Another Wave: North Korean Contagious Interview Campaign Drops 35 New Malicious npm Packages