Security News
Cloudflare Adds Security.txt Setup Wizard
Cloudflare has launched a setup wizard allowing users to easily create and manage a security.txt file for vulnerability disclosure on their websites.
@davi-ai/web-speech-cognitive-services-davi
Advanced tools
Polyfill Web Speech API with Cognitive Services Speech-to-Text service
This package is based on a fork of web-speech-cognitive-services. The primary goal is the use the SpeechSynthetizer from microsoft-cognitiveservices-speech-sdk in the TTS part of the package, in order to receive the boundaries and visemes on a speech synthesis to overcome the existing issues of the original package. Now using Typescript !
npm install @davi-ai/web-speech-cognitive-services-davi
In order to use speech synthesis, you still need to use the original process :
Use the imports from the new package with :
import { createSpeechSynthesisPonyfill } from '@davi-ai/web-speech-cognitive-services-davi'
import type { SpeechSynthesisPonyfillType, SpeechSynthesisUtterance } from '@davi-ai/web-speech-cognitive-services-davi'
You can now listen to the following events by attaching callbacks to the utterance :
{
name: string, // the word / punctuation
elapsedTime: number, // time elapsed since the beginning of the speech
duration: number, // duration of the speech for this word / punctuation
boundaryType: 'WordBoundary' | 'PunctuationBoundary' | 'Viseme' // type of the boundary. 'Viseme' was added by us for private needs
}
This event is fired for each boundary and each viseme in the synthesis {
name: string, // the name of the bookmark
elapsedTime: number // time elapsed since the beginning of the speech
}
{
name: string, // the id of the viseme
elapsedTime: number, // time elapsed since the beginning of the speech
duration: 0,
boundaryType: 'Viseme'
}
This event is fired for each viseme in the synthesis.
(Viseme id documentation here)utterance.onsynthesisstart = (): void => { 'Synthesis started !' }
utterance.onsynthesiscompleted = (): void => { 'Synthesis ended !' }
utterance.onboundary = (event): void => { console.log('Boundary data : ', event.boundaryType, event.name, event.elapsedTime, event.duration )}
Using the SpeechSynthetizer class leads to several improvements in the functionalities :
start
event is now linked to the oncanplaythrough
event of the AudioElement used by the AudioContext. This allows a better synchronisation at the beginning of the speech.mute()
and unmute()
on the ponyfill.speechSynthesis object anytimeYou can retrieve all data synthesized in an ArrayBuffer once the synthesis is finished, by using ponyfill.speechSynthesis.synthesizeAndGetArrayData(utterance: SpeechSynthesisUtterance, callback: (data: ArrayBuffer) => void)
The data
will contain the whole synthesis and can then be used (for example you can create a Blob from these data and play it).
Example
const callback = (data: ArrayBuffer): void => {
const blob = new Blob([data], { type: 'audio/mp3' })
const url = URL.createObjectURL(blob)
const audioElement = document.getElementById('myaudio')
audioElement.src = url
audioElement.play()
}
ponyfill.speechSynthesis.synthesizeAndGetArrayData(utterance, callback)
You can pass a stream as secondary argument to the speak
method, to prevent the synthesizer from playing the synthesized data and retrieve them in the stream on your side.
The stream must be a AudioOutputStream.createPullStream()
object, AudioOutputStream
coming from the microsoft-cognitiveservices-speech-sdk
package.
Example
import { AudioOutputStream } from 'microsoft-cognitiveservices-speech-sdk'
let stream = AudioOutputStream.createPullStream()
ponyfill.speechSynthesis.speak(utterance, stream)
In order to use speech recognition, the process has been modified in order to mimic the one used in speech synthesis :
Use the imports from the new package with :
import { createSpeechRecognitionPonyfill } from '@davi-ai/web-speech-cognitive-services-davi'
import type { SpeechRecognitionPonyfillType, SpeechRecognitionProps } from '@davi-ai/web-speech-cognitive-services-davi'
interface SpeechRecognitionProps {
autoStart?: boolean // Start recognizing after creation
passive?: boolean // Passive / active recognition, see below
wakeWords?: Array<string> // List of words that trigger the onwakeup callback in passive mode only
continuous?: boolean
lang?: string
grammarsList?: Array<string> | string
interimResults?: boolean
timerBeforeSpeechEnd?: number // Set delay (in ms) for recognition ending after something has been recognized (silence time at the end of the recognition)
debug?: boolean // Log calls to events when true
}
const options: SpeechRecognitionProps = {
autoStart: false,
passive: true,
wakeWords: ['hello', 'world'],
continuous: true,
interimResults: true,
grammarsList: [],
lang: 'en-US',
timerBeforeSpeechEnd: 3000,
debug: false
}
const ponyfillCredentials = {
region: 'westus',
authorizationToken / subscriptionKey: '<connexion data>'
}
const ponyfill = createSpeechRecognitionPonyfill(
{ credentials: ponyfillCredentials },
options
)
You can use active (passive = false) or passive mode (passive = true) for recognition. Active mode is the one that existed before, while passive mode has been added. The passive mode is intended to run as a background task (with continuous = true), to detect specific words (wakeWords) and then call the onwakeup callback. This comes from the fact that we can't use custom keywords in speech recognition in javascript as of today From Microsoft docs here
Here is the basic implementation. You can overload each and any by attaching callbacks to your speechRecognitionPonyfill.speechRecognition, if you need.
// These 2 callbakcs are called only in active mode
onstart = (): void => {};
onend = (): void => {};
// These 2 callbakcs are called only in passive mode
onpassivestart = (): void => {};
onpassiveend = (): void => {};
onaudiostart = (): void => {};
onaudioend = (): void => {};
onsoundstart = (): void => {};
onsoundend = (): void => {};
onspeechstart = (): void => {};
onspeechend = (): void => {};
onerror = (value: any): void => {
console.log('Error : ', value);
};
onabort = (): void => {
this._debug && console.log('Recognition aborted');
}
// List of results from the current recognition (word by word in active mode, only when a recognition is finished in passive mode)
onresult = (value: Array<SpeechRecognitionResultListItem> | SpeechRecognitionResultList): void => {
this._debug && console.log('Result : ', value);
};
// Last result when passive mode is used
onpassiveresult = (value: Array<SpeechRecognitionResultListItem> | SpeechRecognitionResultList): void => {
this._debug && console.log('Passive Result : ', value);
};
// Called when a 'wake word' is found in the current recognition in passive mode
onwakeup = (): void => {
this._debug && console.log('Wake up !');
};
Three new methods were implemented, to make it easier to use some functionalities :
FAQs
Polyfill Web Speech API with Cognitive Services Speech-to-Text service
The npm package @davi-ai/web-speech-cognitive-services-davi receives a total of 3 weekly downloads. As such, @davi-ai/web-speech-cognitive-services-davi popularity was classified as not popular.
We found that @davi-ai/web-speech-cognitive-services-davi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Cloudflare has launched a setup wizard allowing users to easily create and manage a security.txt file for vulnerability disclosure on their websites.
Security News
The Socket Research team breaks down a malicious npm package targeting the legitimate DOMPurify library. It uses obfuscated code to hide that it is exfiltrating browser and crypto wallet data.
Security News
ENISA’s 2024 report highlights the EU’s top cybersecurity threats, including rising DDoS attacks, ransomware, supply chain vulnerabilities, and weaponized AI.