Word Timings Generator
Uses Vosk to analyze audio files and generate word timings.
Usage
npx word-timings -c ./path/to/my-config.json
Uses a configuration file in the format:
{
"model": "path/to/model",
"cache": "path/to/.cachefile",
"pretty": true,
"outputs": [
{
"file": "path/to/output.json",
"globs": ["path/to/*.wav"]
}
]
}
Audio files must be mono PCM .wav files, and are suggested to run in 16khz (although higher sample rates seem to work okay).
Output
Output will be a JSON dictionary of filenames (no path or extension) to arrays of time data.
{
"myfile": [[0.1, 0.3], 0.4, 0.5, 0.6, [0.8, 1.2]]
}
Time data is an array, where every element is either a tuple representing the start & end time of that word, or a number representing the end time of the word with the start time being the previous word's end time. All times are in seconds.