Security News
The Dark Side of Open Source
At Node Congress, Socket CEO Feross Aboukhadijeh uncovers the darker aspects of open source, where applications that rely heavily on third-party dependencies can be exploited in supply chain attacks.
@solyarisoftware/voskjs
Advanced tools
Readme
VoskJs is a NodeJs developers toolkit to use Vosk offline speech recognition engine, including multi thread (server) usage examples. The project gives you:
voskjs
voskjshttp
VoskJs can be used for speech recognition processing in different scenarios:
Vosk is an open source embedded (offline/on-prem) speech-to-text engine
which can run with very low latencies (< 500
msecs on my PC).
Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring),
HMM and WFST frameworks are used for time models (language models).
It's based on Kaldi,
but Nikolay V. Shmyrev's Vosk offers a smart, simplified and performing interface!
More details in the Vosk home page
and github repo.
The goal of the project is to create an simple function API layer on top of already existing Vosk nodejs binding, supplying both sentence-based and streaming-based speech-to-text functionalities.
In this mode, a file or a PCM buffer are processed asynchronously, to get the full text transcript of the given speech. Using the simple transcript interface you can build your standalone custom application, accessing async functions suitable to run on a usual single thread nodejs program.
Pseudo code:
//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)
// transcripts a speech file or buffer (in WAV/PCM format), using Vosk engine.
// It supply speech-to-text transcript detailed info.
const result = await transcriptFromFile(fileName, model, {options})
// or
// const result = await transcriptFromBuffer(buffer, model, {options})
freeModel(model)
Following Vosk-api recognizer result functions, VoskJs emit these nodejs events:
Event name | Vosk-api recognizer function | description |
---|---|---|
partial | recognizer.patialResult() | silent (text = '') or new word or new words |
endOfSpeech | recognizer.result() | end of speech (words followed by a silence) |
final | recognizer.finalResult() | last part of the audio |
Pseudo code:
//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)
const transcriptEvents = transcriptEventsFromFile(fileName, model, {options})
// or
// const transcriptEvents = transcriptEventsFromBuffer(buffer, model, {options})
// an new word is detected
transcriptEvents.on('partial', data => console.log(data) )
// a complete sentence (followed by silence) is detected
transcriptEvents.on('endOfSpeech', data => console.log(data) )
// final (last) sentence is detected
transcriptEvents.on('final', data => console.log(data) )
freeModel(model)
voskjs
: command line program to test Vosk transcript with specific models
(some tests and command line usage here).
BTW the utility can be configured to tabularize events. By example:
voskjs --audio=audio/sentencesWithSilences.wav --model=models/vosk-model-small-en-us-0.15 --tableevents
voskjs is a CLI utility to test Vosk-api features
package @solyarisoftware/voskjs version 1.2.7, Vosk-api version 0.3.30
Statistics:
model directory : models/vosk-model-small-en-us-0.15
speech file name : audio/sentencesWithSilences.wav
grammar : not specified. Default: NO
sample rate : not specified. Default: 16000
max alternatives : undefined
text only / JSON : JSON
Vosk debug level : -1
load model latency : 2001ms
transcript latency : 1707ms
transcript text : one two three four five six seven eight nine zero one two three stop
Events table:
| time | event | text |
| ------ | ------------ | ---------------------------------------- |
| 66 | partial |
| 489 | partial | one
| 538 | partial | one two
| 592 | partial | one two three
| 635 | endOfSpeech | one two three
| 668 | partial |
| 847 | partial | for
| 882 | partial | four five six
| 977 | partial | four five six seven
| 1099 | partial | four five six seven eight
| 1169 | endOfSpeech | four five six seven eight
| 1194 | partial |
| 1322 | partial | nine
| 1381 | partial | nine zero
| 1456 | partial | nine zero one
| 1498 | partial | nine zero one two
| 1550 | partial | nine zero one two three
| 1630 | partial | nine zero one two three stop
| 1649 | endOfSpeech | nine zero one two three stop
| 1677 | partial |
| 1706 | final |
voskjshttp
: a simple demo HTTP server to transcript speech files.
Using above API you can build your own server. Some usage examples here.
Install vosk-api engine
pip3 install -U vosk
See also: https://alphacephei.com/vosk/install
Install this module, as global package if you want to use CLI command voskjs
npm install -g @solyarisoftware/voskjs@latest
mkdir your/path/models && cd models
# English large model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip
# English small model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
# Italian model model
wget https://alphacephei.com/vosk/models/vosk-model-small-it-0.4.zip
unzip vosk-model-small-it-0.4.zip
More about available Vosk models here: https://alphacephei.com/vosk/models
Directory audio
contains some English language speech audio files,
coming from a Mozilla DeepSpeech repo.
Source: Mozilla DeepSpeech audio samples
These files are used for some tests and comparisons.
Some VoskJs usage examples:
voskjs
Command line utilityvoskjshttp
demo speech-to-text HTTP servervoskjshttp
as RHASSPY speech-to-text remote HTTP ServerSome tests/notes:
audioutils
some audio utility functions as toPCM
,
a fast transcoding to PCM, using ffmpeg process (install ffmpeg before).
voskjshttp
:
--header "Content-Type: audio/wav"
)toPcm
if input speech files are not specified as wav in header request (e.g. --header "Content-Type: audio/webm"
)
see https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats#audio-formats-listIf you like the project, please ⭐️ star this repository to show your support! 🙏
Any contribute is welcome:
Thanks to Nicolay V. Shmyrev, author of Vosk project, for the help about nodeJs API bindings for multi-threading management
MIT (c) Giorgio Robino
FAQs
NodeJs developers API for Vosk-api speech-to-text engine.
The npm package @solyarisoftware/voskjs receives a total of 8 weekly downloads. As such, @solyarisoftware/voskjs popularity was classified as not popular.
We found that @solyarisoftware/voskjs demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
At Node Congress, Socket CEO Feross Aboukhadijeh uncovers the darker aspects of open source, where applications that rely heavily on third-party dependencies can be exploited in supply chain attacks.
Research
Security News
The Socket Research team found this npm package includes code for collecting sensitive developer information, including your operating system username, Git username, and Git email.
Security News
OpenJS is warning of social engineering takeovers targeting open source projects after receiving a credible attempt on the foundation.