🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Book a DemoInstallSign in
Socket

whisper-node-server

Package Overview
Dependencies
Maintainers
0
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

whisper-node-server

Local audio transcription on CPU. Node.js bindings for OpenAI's Whisper. Modified from node-whisper

1.0.0
latest
Source
npm
Version published
Maintainers
0
Created
Source

whisper-node-server

npm downloads npm downloads

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

  • Output transcripts to JSON (also .txt .srt .vtt)
  • Optimized for CPU (Including Apple Silicon ARM)
  • Timestamp precision to single word
  • Server mode with automatic audio conversion
  • Optional CUDA support for GPU acceleration

Installation

  • Add dependency to project
npm install whisper-node-server
  • Download whisper model of choice [OPTIONAL]
npx whisper-node-server download
  • Build whisper.cpp

Windows

use w64devkit and cmake

Usage

Direct Usage

import whisper from 'whisper-node-server';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Server Mode

  • Set up environment variables:
WHISPER_MODEL=base.en
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1
  • Create the server:
import express from 'express';
import multer from 'multer';
import whisper from 'whisper-node-server';
import { exec } from 'child_process';
import { promisify } from 'util';
import fs from 'fs';

const app = express();
const upload = multer({ dest: 'uploads/' });
const execPromise = promisify(exec);

// Transcribe endpoint
app.post('/transcribe', upload.single('audio'), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).send('No audio file uploaded');
    }

    const inputPath = req.file.path;
    const outputPath = inputPath.replace(/\.wav$/, '_converted.wav');

    // Convert audio to configured sample rate using FFmpeg
    await execPromise(`ffmpeg -y -i "${inputPath}" -ar ${process.env.AUDIO_SAMPLE_RATE} -ac ${process.env.AUDIO_CHANNELS} -c:a pcm_s16le "${outputPath}"`);

    // Transcribe the audio
    const options = {
      modelName: process.env.WHISPER_MODEL,
      whisperOptions: {
        language: 'auto',
        word_timestamps: true
      }
    };

    const transcript = await whisper(outputPath, options);

    // Clean up temp files
    fs.unlinkSync(inputPath);
    fs.unlinkSync(outputPath);

    // Extract speech text
    const text = transcript ? (Array.isArray(transcript) ? 
      transcript.map(t => t.speech).join(' ') : 
      transcript.toString()) : '';
      
    res.json({ text });

  } catch (error) {
    console.error('Transcription error:', error);
    res.status(500).send('Error processing audio: ' + error.message);
  }
});

app.listen(8080, () => {
  console.log('Server running on port 8080');
});
  • Send audio for transcription:
// Convert your audio to a blob
const wavBlob = await float32ArrayToWav(audio);
const formData = new FormData();
formData.append('audio', wavBlob, 'recording.wav');

// Send to server
const response = await fetch('http://localhost:8080/transcribe', {
  method: 'POST',
  body: formData,
});

if (!response.ok) {
  throw new Error('Transcription failed');
}

const data = await response.json();
console.log('Transcription:', data.text);

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Full Options List

import whisper from 'whisper-node-server';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Modifying whisper-node-server

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Acknowledgements

Keywords

OpenAI

FAQs

Package last updated on 14 Dec 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts