
Security News
ECMAScript 2025 Finalized with Iterator Helpers, Set Methods, RegExp.escape, and More
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
node-diarization
Advanced tools
Transcription and diarization using Whisper and Pyannote with NodeJS
Diarization and recognition audio module using faster-whisper
and pyannote
modules.
Inspired by whisperX
and whisper-node
.
Since main part of the library is python, main goal is to make it as friendly as possible for Node.js users.
Unlike openai-whisper, FFmpeg does not need to be installed on the system. The audio is decoded with the Python library PyAV which bundles the FFmpeg libraries in its package.
(c) faster-whisper
First of all you need to install Python
.
a) For macOS and linux users is a best practice to create virtual environment
.
python -m venv ~/py_envs
Remember this path (~/py_envs), and install all packages in there. Exec this command before pip
install.
. ~/py_envs/bin/activate
b) For Windows users command is slightly different.
python -m venv C:\py_envs
C:\py_envs\Scripts\activate.bat
Path is C:\py_envs
Install faster-whisper
module.
pip install faster-whisper
Install pyannote
module.
a) Install pyannote.audio
with pip install pyannote.audio
b) Accept pyannote/segmentation-3.0
user conditions
c) Accept pyannote/speaker-diarization-3.1
user conditions
d) Create access token at hf.co/settings/tokens
.
(c) pyannote
Finally npm i node-diarization
Ok, we are done! Good job.
import WhisperDiarization from 'node-diarization';
// only one required option is pyannote hf auth token
const options = {
diarization: {
pyannote_auth_token: 'YOUR_TOKEN'
}
};
(async () => {
// only works with .wav files
const wd = new WhisperDiarization('PATH/TO/WAV', options);
wd.init().then(r=> {
r.union?.map(s => {
console.log(`${s.speaker} [${s.info.start} - ${s.info.end}]: ${s.text}`);
});
// shell .py errors
console.log(r.errors);
}).catch(err => console.log(err));
// event listener for stream transcribition
// works only if both tasks provided (recognition and diarization)
wd.on('data', segments => {
segments.map(s => {
console.log(`${s.speaker} [${s.info.start} - ${s.info.end}]: ${s.text}`)
});
});
wd.on('end', _ => {
console.log('Done');
});
})();
const { WhisperDiarization } = require('node-diarization');
As result, you will get diarizied JSON output or error string.
{
"errors": "[]",
"rawDiarization": "[Array]",
"rawRecognition": "[Array]",
"result": [
{
"info": "[Object]",
"words": "[Array]",
"speaker": "SPEAKER_00",
"text": "..."
},
{
"info": "[Object]",
"words": "[Array]",
"speaker": "SPEAKER_01",
"text": "..."
}
]
}
You can create separate tasks, like recognition or diarization, but you need to add raw in options.
const options = {
recognition: {
raw: true,
},
tasks: ['recognition'],
}
For recognition models you have 3 choices:
hfRepoModel
to true
. This is the path to hf repo model. If repo is not existed, then trace error will be from python.const options = {
python: {
// venv path i.e. "~/py_envs",
// https://docs.python.org/3/library/venv.html,
// default undefined
venvPath: '~/py_envs',
// python shell command, can be "python3", "py", etc.,
// default "python"
var: 'python',
},
diarization: {
// pyannote hf auth token,
// https://huggingface.co/settings/tokens,
// required if diarization task is set
pyannote_auth_token: 'YOU_TOKEN',
// return raw diarization object from py script,
// default false
raw: false,
// number of speakers, when known,
// default undefined
num_speakers: 1,
// minimum number of speakers,
// has no effect when `num_speakers` is provided,
// default undefined
min_speakers: 1,
// maximum number of speakers,
// has no effect when `num_speakers` is provided,
// default undefined
max_speakers: 1,
},
recognition: {
// return raw recognition object from py script,
// default false
raw: false,
// original Whisper model name,
// or path to model.bin, i.e. /path/to/models where model.bin is located,
// or namespace/repo_name for hf model
// default "tiny"
model: 'tiny',
// pass js check for standard whisper model name or pass to model.bin
// if repo is not existed, then it will be python error
// default false
hfRepoModel:false,
// recognition options
// default 5
beam_size: 5,
// recognition options
// default undefined
compute_type: 'float32',
},
checks: {
// default checks for python vars availability, also py scripts
// before run diarization and recognition
// default false
proceed: false,
//if proceed false, tasks ignored
recognition: true,
diarization: true,
},
// array of tasks,
// can be ["recognition"], ["diarization"] or both,
// if undefined, will set all tasks,
// default undefined,
tasks: [],
shell: {
// silent shell console output,
// default true
silent: true,
},
// information text in console, default true
consoleTextInfo: true,
}
Before start, you would like to check all requirements, so we have a static check
function. Also check options is default false
in main wd.
import WhisperDiarization from 'node-diarization';
// options.python (if different from default ones)
const options = {
venvPath: '~/py_envs',
var: 'python',
};
(async () => {
WhisperDiarization.check(options).catch(err => console.log(err));
})();
Also, you can enable check
in main function options.
FAQs
Transcription and diarization using Whisper and Pyannote with NodeJS
We found that node-diarization demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.
Research
North Korean threat actors linked to the Contagious Interview campaign return with 35 new malicious npm packages using a stealthy multi-stage malware loader.