Security News
Bun 1.2 Released with 90% Node.js Compatibility and Built-in S3 Object Support
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
web-speech-cognitive-services
Advanced tools
Polyfill Web Speech API with Cognitive Services Speech-to-Text service
Polyfill Web Speech API with Cognitive Services Speech-to-Text service.
This scaffold is provided by react-component-template
.
Try out our demo at https://compulim.github.io/web-speech-cognitive-services?s=your-subscription-key.
We use react-dictate-button
to quickly setup the playground.
Web Speech API is not widely adopted on popular browsers and platforms. Polyfilling the API using cloud services is a great way to enable wider adoption. Nonetheless, Web Speech API in Google Chrome is also backed by cloud services.
Microsoft Azure Cognitive Services Speech-to-Text service provide speech recognition with great accuracy. But unfortunately, the APIs are not based on Web Speech API.
This package will polyfill Web Speech API by turning Cognitive Services Speech-to-Text API into Web Speech API. We test this package with popular combination of platforms and browsers.
First, run npm install web-speech-cognitive-services
for latest production build. Or npm install web-speech-cognitive-services@master
for latest development build.
import CognitiveServicesSpeechRecognition from 'web-speech-cognitive-services';
const recognition = new CognitiveServicesSpeechRecognition();
// There are two ways to provide your credential:
// 1. Provide a subscription key (good for prototype, not for production)
// 2. Provide a mechanism to obtain/refresh access token
// If you are using subscription key
recognition.subscriptionKey = 'your subscription key';
// If you are using access token, refreshToken === true, if we are renewing the token, otherwise, false
recognition.tokenFetch = async (authFetchEventID, refreshToken) => {
};
recognition.lang = 'en-US';
recognition.onresult = ({ results }) => {
console.log(results);
};
recognition.start();
You can use react-dictate-button
to integrate speech recognition functionality to your React app.
import CognitiveServicesSpeechRecognitionm, { CognitiveServicesSpeechGrammarList } from 'web-speech-recognition-services';
import DictateButton from 'react-dictate-button';
export default props =>
<DictateButton
extra={{ subscriptionKey: 'your subscription key' }}
onDictate={ ({ result }) => alert(result.transcript) }
speechGrammarList={ CognitiveServicesSpeechGrammarList }
speechRecognition={ CognitiveServicesSpeechRecognition }
>
Start dictation
</DictateButton>
You can also look at our playground page to see how it works.
Browsers are all latest as of 2018-06-28, except:
Quick grab:
Platform | OS | Browser | Cognitive Services (WebRTC) | Web Speech API |
---|---|---|---|---|
PC | Windows 10 (1803) | Chrome 67.0.3396.99 | Yes | Yes |
PC | Windows 10 (1803) | Edge 42.17134.1.0 | Yes | No, SpeechRecognition not implemented |
PC | Windows 10 (1803) | Firefox 61.0 | Yes | No, SpeechRecognition not implemented |
MacBook Pro | macOS High Sierra 10.13.1 | Chrome 67.0.3396.99 | Yes | Yes |
MacBook Pro | macOS High Sierra 10.13.1 | Safari 11.0.1 | Yes | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Safari | Yes | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Safari | No, AudioSourceError | No, SpeechRecognition not implemented |
Google Pixel 2 | Android 8.1.0 | Chrome 67.0.3396.87 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Edge 42.0.0.2057 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Firefox 60.1.0 | Yes | Yes |
Microsoft Lumia 950 | Windows 10 (1709) | Edge 40.15254.489.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
Microsoft Xbox One | Windows 10 (1806) 17134.4054 | Edge 42.17134.4054.0 | No, AudioSourceError | No, SpeechRecognition not implemented |
We test multiple scenarios to make sure we polyfill Web Speech API correctly. Following are events and its firing order, in Cognitive Services and Web Speech API respectively.
Everything works, including multiple interim results.
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechHypothesisEvent
(could be more than one)SpeechEndDetectedEvent
SpeechDetailedPhraseEvent
RecognitionEndedEvent
start
audiostart
soundstart
speechstart
result
(multiple times)speechend
soundend
audioend
result(results = [{ isFinal = true }])
end
SpeechEndDetectedEvent
immediately, very similar to happy path, could still result in success, silent, or no matchstart
audiostart
audioend
error(error = 'aborted')
end
SpeechEndDetectedEvent
immediately, very similar to happy path, could still result in success, silent, or no matchstart
audiostart
soundstart
speechstart
result
(one or more)speechend
soundend
audioend
error(error = 'aborted')
end
Turn on airplane mode.
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionEndedEvent(Result.RecognitionStatus = 'ConnectError')
start
audiostart
audioend
error(error = 'network')
end
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechEndDetectedEvent
SpeechDetailedPhraseEvent(Result.RecognitionStatus = 'InitialSilenceTimeout')
RecognitionEndedEvent
start
audiostart
audioend
error(error = 'no-speech')
end
Some sounds are heard, but they cannot be recognized as text. There could be some interim results with recognized text, but the confidence is so low it dropped out of final result.
RecognitionTriggeredEvent
ListeningStartedEvent
ConnectingToServiceEvent
RecognitionStartedEvent
SpeechHypothesisEvent
(could be more than one)SpeechEndDetectedEvent
SpeechDetailedPhraseEvent(Result.RecognitionStatus = 'NoMatch')
RecognitionEndedEvent
start
audiostart
soundstart
speechstart
result
speechend
soundend
audioend
end
Note: the Web Speech API has
onnomatch
event, but unfortunately, Google Chrome did not fire this event.
The user click "deny" on the permission dialog, or there are no microphone detected in the system.
RecognitionTriggeredEvent
RecognitionEndedEvent(Result.RecognitionStatus = 'AudioSourceError')
error(error = 'not-allowed')
end
0.5
for interim resultsmicrosoft-speech-browser-sdk@0.0.12
, tracking on this issueLike us? Star us.
Want to make it better? File us an issue.
Don't like something you see? Submit a pull request.
FAQs
Polyfill Web Speech API with Cognitive Services Speech-to-Text service
The npm package web-speech-cognitive-services receives a total of 4,991 weekly downloads. As such, web-speech-cognitive-services popularity was classified as popular.
We found that web-speech-cognitive-services demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
Security News
Biden's executive order pushes for AI-driven cybersecurity, software supply chain transparency, and stronger protections for federal and open source systems.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.