This package is based on a fork of web-speech-cognitive-services.
The primary goal is the use the SpeechSynthetizer from microsoft-cognitiveservices-speech-sdk in the TTS part of the package, in order to receive the boundaries and visemes on a speech synthesis to overcome the existing issues of the original package.
npm install @davi-ai/web-speech-cognitive-services-davi
Changes compared to original package
In order to use speech synthesis, you still need to use the original process :
- create a speechSynthesisPonyfill with your credentials, containing a speechSynthesis object :
- wait for the voices to be loaded
- create a SpeechSynthesisUtterance
- attach events to the utterance
- play the utterance
Use the imports from the new package with :
import { createSpeechSynthesisPonyfill } from '@davi-ai/web-speech-cognitive-services-davi/lib/SpeechServices'
import type { SpeechSynthesisUtterance } from '@davi-ai/web-speech-cognitive-services-davi/lib/SpeechServices'
You can now listen to the following events by attaching callbacks to the utterance :
- onsynthesisstart : fired when the synthesis starts
- onsynthesiscompleted : fired when the synthesis is completed
- onboundary : receive an event with the following data
name: string,
elapsedTime: number,
duration: number,
boundaryType: 'WordBoundary' | 'PunctuationBoundary' | 'Viseme'
This event is fired for each boundary and each viseme in the synthesis - onmark : receive an event with the following data
name: string,
elapsedTime: number
- onviseme : receive an event with the following data
name: string,
elapsedTime: number,
duration: 0,
boundaryType: 'Viseme'
This event is fired for each viseme in the synthesis.
(Viseme id documentation here) - examples :
utterance.onsynthesisstart = (): void => { 'Synthesis started !' }
utterance.onsynthesiscompleted = (): void => { 'Synthesis ended !' }
utterance.onboundary = (event): void => { console.log('Boundary data : ', event.boundaryType, event.name, event.elapsedTime, event.duration )}
Using the SpeechSynthetizer class leads to several improvements in the functionalities :
- the
event is now linked to the oncanplaythrough
event of the AudioElement used by the AudioContext. This allows a better synchronisation at the beginning of the speech. - you can call
and unmute()
on the ponyfill.speechSynthesis object anytime
Other Features
Retrieve synthesized data
You can retrieve all data synthesized in an ArrayBuffer once the synthesis is finished, by using ponyfill.speechSynthesis.synthesizeAndGetArrayData(utterance: SpeechSynthesisUtterance, callback: (data: ArrayBuffer) => void)
The data
will contain the whole synthesis and can then be used (for example you can create a Blob from these data and play it).
const callback = (data: ArrayBuffer): void => {
const blob = new Blob([data], { type: 'audio/mp3' })
const url = URL.createObjectURL(blob)
const audioElement = document.getElementById('myaudio')
audioElement.src = url
ponyfill.speechSynthesis.synthesizeAndGetArrayData(utterance, callback)
Use a stream to get data
You can pass a stream as secondary argument to the speak
method, to prevent the synthesizer from playing the synthesized data and retrieve them in the stream on your side.
The stream must be a AudioOutputStream.createPullStream()
object, AudioOutputStream
coming from the microsoft-cognitiveservices-speech-sdk
import { AudioOutputStream } from 'microsoft-cognitiveservices-speech-sdk'
let stream = AudioOutputStream.createPullStream()
ponyfill.speechSynthesis.speak(utterance, stream)