web-speech-cognitive-services
Polyfill Web Speech API with Cognitive Services.
This scaffold is provided by react-component-template
.
Demo
Try out our demo at https://compulim.github.io/web-speech-cognitive-services?s=your-subscription-key.
We use react-dictate-button
to quickly setup the playground.
Background
Web Speech API is not widely adopted on popular browsers and platforms. Polyfilling the API using cloud services is a great way to enable wider adoption. Nonetheless, Web Speech API in Google Chrome is also backed by cloud services.
Microsoft Azure Cognitive Services Speech-to-Text service provide speech recognition with great accuracy. But unfortunately, the APIs are not based on Web Speech API.
This package will polyfill Web Speech API by turning Cognitive Services Speech-to-Text API into Web Speech API. We test this package with popular combination of platforms and browsers.
Test matrix
Browsers are all latest as of 2018-06-28, except:
- macOS was 10.13.1 (2017-10-31), instead of 10.13.5
- Tthere should be no change on the matrix since Safari does not support Web Speech API
- Xbox was tested on Insider build (1806)
Overall in point form:
- With Web Speech API only, web dev can enable speech recognition on most popular platforms, except iOS
- iOS: No browsers on iOS support Web Speech API
- Some platforms requires non-default browser
- With Cognitive Services Speech-to-Text, all popular platforms with their default browsers are supported
- iOS: Chrome and Edge does not support Cognitive Services because WebRTC is disabled
Platform | OS | Browser | Cognitive Services (WebRTC) | Web Speech API |
---|
PC | Windows 10 (1803) | Chrome 67.0.3396.99 | Yes | Yes |
PC | Windows 10 (1803) | Edge 42.17134.1.0 | Yes | No, SpeechRecognition not implemented |
PC | Windows 10 (1803) | Firefox 61.0 | Yes | No, SpeechRecognition not implemented |
MacBook Pro | macOS High Sierra 10.13.1 | Chrome 67.0.3396.99 | Yes | Yes |
MacBook Pro | macOS High Sierra 10.13.1 | Safari 11.0.1 | Yes | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Safari | Yes | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Safari | No, AudioSourceError | No, SpeechRecognition not implemented |
Google Pixel 2 | Android 8.1.0 | Chrome 67.0.3396.87 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Edge 42.0.0.2057 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Firefox 60.1.0 | Yes | Yes |
Microsoft Lumia 950 | Windows 10 (1709) | Edge 40.15254.489.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Microsoft Xbox One | Windows 10 (1806) 17134.4054 | Edge 42.17134.4054.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Event lifecycle scenarios
We test multiple scenarios to make sure the package polyfill Web Speech API correctly. Following are events and its firing order.
Happy path
Everything works, including multiple interim results.
- Cognitive Services
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechHypothesisEvent
(could be more than one)SpeechEndDetectedEvent
SpeechDetailedPhraseEvent
RecognitionEndedEvent
- Web Speech API
start
audiostart
soundstart
speechstart
result
(multiple times)speechend
soundend
audioend
result(results = [{ isFinal = true }])
end
Abort during recognition
Abort before first recognition is made
- Cognitive Services
- Essentially muted the speech, that could still result in success, silent, or no match
- Web Speech API
start
audiostart
audioend
error(error = 'aborted')
end
Abort after some speech is recognized
- Cognitive Services
- Essentially muted the speech, that could still result in success, silent, or no match
- Web Speech API
start
audiostart
soundstart
(optional)speechstart
(optional)result
(optional)speechend
(optional)soundend
(optional)audioend
error(error = 'aborted')
end
Network issues
Turn on airplane mode.
- Cognitive Services
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionEndedEvent(Result.RecognitionStatus = 'ConnectError')
- Web Speech API
start
audiostart
audioend
error(error = 'network')
end
Audio muted or volume too low
- Cognitive Services
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechEndDetectedEvent
SpeechDetailedPhraseEvent(Result.RecognitionStatus = 'InitialSilenceTimeout')
RecognitionEndedEvent
- Web Speech API
start
audiostart
audioend
error(error = 'no-speech')
end
No speech is recognized
Some sounds are heard, but they cannot be recognized as text. There could be some interim results with recognized text, but the confidence is so low it dropped out of final result.
- Cognitive Services
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechHypothesisEvent
(could be more than one)SpeechEndDetectedEvent
SpeechDetailedPhraseEvent(Result.RecognitionStatus = 'NoMatch')
RecognitionEndedEvent
- Web Speech API
start
audiostart
soundstart
speechstart
result
speechend
soundend
audioend
end
Note: the Web Speech API has onnomatch
event, but unfortunately, Google Chrome did not fire this event.
Not authorized to use microphone
The user click "deny" on the permission dialog, or there are no microphone detected in the system.
- Cognitive Services
RecognitionTriggeredEvent
RecognitionEndedEvent(Result.RecognitionStatus = 'AudioSourceError')
- Web Speech API
error(error = 'not-allowed')
end
Known issues
- Interim results do not return confidence, final result do have confidence
- We always return
0.5
for interim results
- Cognitive Services support grammar list but not in JSGF format, more work to be done in this area
- Although Google Chrome support setting the grammar list, it seems the grammar list is not used at all
Contributions
Like us? Star us.
Want to make it better? File us an issue.
Don't like something you see? Submit a pull request.