web-speech-cognitive-services
Advanced tools
Comparing version
@@ -8,3 +8,13 @@ # Changelog | ||
## [Unreleased] | ||
### Added | ||
- New playground for better debuggability | ||
- Support of Speech Services SDK, with automated unit tests for speech recognition | ||
- See [`SPEC-RECOGNITION.md`](SPEC-RECOGNITION.md) and [`SPEC-SYNTHESIS.md`](SPEC-SYNTHESIS.md) for quirks | ||
- Speech recognition: Support `stop` on Speech Services | ||
- Speech synthesis: Support `pause` and `resume` (with `pause` and `resume` event) | ||
### Changed | ||
- Ponyfill are now constructed based on options (authorization token, region, and subscription key) | ||
- A new set of ponyfill will be created every time an option has changed | ||
## [3.0.0] - 2018-10-31 | ||
@@ -11,0 +21,0 @@ ### Added |
@@ -8,42 +8,8 @@ "use strict"; | ||
}); | ||
Object.defineProperty(exports, "createFetchTokenUsingSubscriptionKey", { | ||
enumerable: true, | ||
get: function get() { | ||
return _createFetchTokenUsingSubscriptionKey.default; | ||
} | ||
}); | ||
Object.defineProperty(exports, "SpeechGrammarList", { | ||
enumerable: true, | ||
get: function get() { | ||
return _SpeechGrammarList.default; | ||
} | ||
}); | ||
Object.defineProperty(exports, "SpeechRecognition", { | ||
enumerable: true, | ||
get: function get() { | ||
return _SpeechRecognition.default; | ||
} | ||
}); | ||
Object.defineProperty(exports, "speechSynthesis", { | ||
enumerable: true, | ||
get: function get() { | ||
return _speechSynthesis.default; | ||
} | ||
}); | ||
Object.defineProperty(exports, "SpeechSynthesisUtterance", { | ||
enumerable: true, | ||
get: function get() { | ||
return _SpeechSynthesisUtterance.default; | ||
} | ||
}); | ||
exports.default = void 0; | ||
var _createFetchTokenUsingSubscriptionKey = _interopRequireDefault(require("./util/createFetchTokenUsingSubscriptionKey")); | ||
var _SpeechServices = _interopRequireDefault(require("./SpeechServices")); | ||
var _SpeechGrammarList = _interopRequireDefault(require("./recognition/SpeechGrammarList")); | ||
var _SpeechRecognition = _interopRequireDefault(require("./recognition/SpeechRecognition")); | ||
var _speechSynthesis = _interopRequireDefault(require("./synthesis/speechSynthesis")); | ||
var _SpeechSynthesisUtterance = _interopRequireDefault(require("./synthesis/SpeechSynthesisUtterance")); | ||
var _default = _SpeechServices.default; | ||
exports.default = _default; | ||
//# sourceMappingURL=index.js.map |
{ | ||
"name": "web-speech-cognitive-services", | ||
"version": "3.0.1-master.d2eb07f", | ||
"version": "4.0.0-master.62b04ac", | ||
"description": "Polyfill Web Speech API with Cognitive Services Speech-to-Text service", | ||
@@ -19,2 +19,3 @@ "keywords": [ | ||
"tts", | ||
"unified speech", | ||
"utterance", | ||
@@ -33,3 +34,3 @@ "voice recognition", | ||
"clean": "rimraf lib", | ||
"test": "echo No tests defined", | ||
"test": "jest", | ||
"watch": "npm run build -- --watch" | ||
@@ -47,2 +48,5 @@ }, | ||
"homepage": "https://github.com/compulim/web-speech-cognitive-services#readme", | ||
"jest": { | ||
"testEnvironment": "node" | ||
}, | ||
"peerDependencies": { | ||
@@ -49,0 +53,0 @@ "microsoft-speech-browser-sdk": "^0.0.12", |
182
README.md
# web-speech-cognitive-services | ||
[](https://badge.fury.io/js/web-speech-cognitive-services) [](https://travis-ci.org/compulim/web-speech-cognitive-services) | ||
Web Speech API adapter to use Cognitive Services Speech Services for both speech-to-text and text-to-speech service. | ||
Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service. | ||
> This scaffold is provided by [`react-component-template`](https://github.com/compulim/react-component-template/). | ||
This scaffold is provided by [`react-component-template`](https://github.com/compulim/react-component-template/). | ||
[](https://badge.fury.io/js/web-speech-cognitive-services) [](https://travis-ci.org/compulim/web-speech-cognitive-services) | ||
@@ -19,21 +19,89 @@ # Demo | ||
Microsoft Azure [Cognitive Services Bing Speech](https://azure.microsoft.com/en-us/services/cognitive-services/speech/) service provide speech recognition with great accuracy. But unfortunately, the APIs are not based on Web Speech API. | ||
Microsoft Azure [Cognitive Services Speech Services](https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/) service provide speech recognition with great accuracy. But unfortunately, the APIs are not based on Web Speech API. | ||
This package will polyfill Web Speech API by turning Cognitive Services Bing Speech API into Web Speech API. We test this package with popular combination of platforms and browsers. | ||
This package will polyfill Web Speech API by turning Cognitive Services Speech Services API into Web Speech API. We test this package with popular combination of platforms and browsers. | ||
# How to use | ||
First, run `npm install web-speech-cognitive-services` for latest production build. Or `npm install web-speech-cognitive-services@master` for latest development build. | ||
For production build, run `npm install web-speech-cognitive-services`. | ||
Then, install peer dependency by running `npm install microsoft-speech-browser-sdk`. | ||
For development build, run `npm install web-speech-cognitive-services@master`. | ||
> Since [Speech Services SDK](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-js-browser) is not on NPM yet, we will bundle the SDK inside this package for now. When Speech Services SDK release on NPM, we will define it as a peer dependency. | ||
## Polyfilling vs. ponyfilling | ||
In JavaScript, polyfill is a technique to bring newer features to older environment. Ponyfill is very similar, but instead polluting the environment by default, we prefer to let the developer to choose what they want. This [article](https://ponyfoo.com/articles/polyfills-or-ponyfills) talks about polyfill vs. ponyfill. | ||
In this package, we prefer ponyfill because it do not pollute the hosting environment. You are also free to mix-and-match multiple speech recognition engines under a single environment. | ||
# Code snippets | ||
> For readability, we omitted the async function in all code snippets. To run the code, you will need to wrap the code using an async function. | ||
## Polyfilling the environment | ||
If the library you are using do not support ponyfill, you can polyfill `window` object with our ponyfill. | ||
```jsx | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
const ponyfill = await createPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
for (let key in ponyfill) { | ||
window[key] = ponyfill[key]; | ||
} | ||
``` | ||
> Note: if you do not specify `region`, we will default to `"westus"`. | ||
> List of supported regions can be found in [this article](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#regions-and-endpoints). | ||
> If you prefer to use the deprecating Bing Speech, import from `'web-speech-cognitive-services/lib/BingSpeech'` instead. | ||
## Using authorization token | ||
Instead of exposing subscription key on the browser, we strongly recommend using authorization token. | ||
```jsx | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
const ponyfill = await createPonyfill({ | ||
authorizationToken: 'YOUR_AUTHORIZATION_TOKEN', | ||
region: 'westus', | ||
}); | ||
``` | ||
You can also provide an async function that will fetch the authorization token on-demand. You should cache the authorization token for subsequent request. | ||
```jsx | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
const ponyfill = await createPonyfill({ | ||
authorizationToken: fetch('https://example.com/your-token').then(res => res.text()), | ||
region: 'westus', | ||
}); | ||
``` | ||
## Speech recognition (speech-to-text) | ||
You can choose to only create ponyfill for speech recognition. | ||
```jsx | ||
import { createFetchTokenUsingSubscriptionKey, SpeechRecognition } from 'web-speech-cognitive-services'; | ||
import { createSpeechRecognitionPonyfill } from 'web-speech-cognitive-services/lib/SpeechServices/SpeechToText'; | ||
const { | ||
SpeechRecognition | ||
} = await createSpeechRecognitionPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
const recognition = new SpeechRecognition(); | ||
recognition.interimResults = true; | ||
recognition.lang = 'en-US'; | ||
recognition.fetchToken = createFetchTokenUsingSubscriptionKey('your subscription key'); | ||
@@ -54,10 +122,15 @@ recognition.onresult = ({ results }) => { | ||
```jsx | ||
import { createFetchTokenUsingSubscriptionKey, SpeechGrammarList, SpeechRecognition } from 'web-speech-cognitive-services'; | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
import DictateButton from 'react-dictate-button'; | ||
const extra = { fetchToken: createFetchTokenUsingSubscriptionKey('your subscription key') }; | ||
const { | ||
SpeechGrammarList, | ||
SpeechRecognition | ||
} = await createPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
export default props => | ||
<DictateButton | ||
extra={ extra } | ||
onDictate={ ({ result }) => alert(result.transcript) } | ||
@@ -71,6 +144,6 @@ speechGrammarList={ SpeechGrammarList } | ||
You can also look at our [playground page](packages/playground/src/DictationPane.js) to see how it works. | ||
### Speech priming (a.k.a. grammars) | ||
> This section is currently not implemented with new Speech SDK. We are leaving the section here for future reference. | ||
You can prime the speech recognition by giving a list of words. | ||
@@ -81,4 +154,12 @@ | ||
```jsx | ||
import { createFetchTokenUsingSubscriptionKey, SpeechGrammarList, SpeechRecognition } from 'web-speech-cognitive-services'; | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
const { | ||
SpeechGrammarList, | ||
SpeechRecognition | ||
} = await createPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
const recognition = new SpeechRecognition(); | ||
@@ -88,3 +169,2 @@ | ||
recognition.grammars.words = ['Tuen Mun', 'Yuen Long']; | ||
recognition.fetchToken = createFetchTokenUsingSubscriptionKey('your subscription key'); | ||
@@ -103,11 +183,14 @@ recognition.onresult = ({ results }) => { | ||
```jsx | ||
import { createFetchTokenUsingSubscriptionKey, speechSynthesis, SpeechSynthesisUtterance } from 'web-speech-cognitive-services'; | ||
import { createSpeechSynthesisPonyfill } from 'web-speech-cognitive-services/lib/SpeechServices/TextToSpeech'; | ||
const fetchToken = createFetchTokenUsingSubscriptionKey('your subscription key'); | ||
const { | ||
speechSynthesis, | ||
SpeechSynthesisUtterance | ||
} = await createSpeechSynthesisPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
const utterance = new SpeechSynthesisUtterance('Hello, World!'); | ||
speechSynthesis.fetchToken = fetchToken; | ||
// Need to wait until token exchange is complete before speak | ||
await fetchToken(); | ||
await speechSynthesis.speak(utterance); | ||
@@ -118,2 +201,4 @@ ``` | ||
> List of supported regions can be found in [this article](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#text-to-speech-api). | ||
`pitch`, `rate`, `voice`, and `volume` are supported. Only `onstart`, `onerror`, and `onend` events are supported. | ||
@@ -126,3 +211,3 @@ | ||
```jsx | ||
import { createFetchTokenUsingSubscriptionKey, speechSynthesis, SpeechSynthesisUtterance } from 'web-speech-cognitive-services'; | ||
import createPonyfill from 'web-speech-cognitive-services/lib/SpeechServices'; | ||
import React from 'react'; | ||
@@ -135,22 +220,24 @@ import Say from 'react-say'; | ||
speechSynthesis.fetchToken = createFetchTokenUsingSubscriptionKey('your subscription key'); | ||
// We call it here to preload the token, the token is cached | ||
speechSynthesis.fetchToken(); | ||
this.state = { ready: false }; | ||
this.state = {}; | ||
} | ||
async componentDidMount() { | ||
await speechSynthesis.fetchToken(); | ||
const ponyfill = await createPonyfill({ | ||
region: 'westus', | ||
subscriptionKey: 'YOUR_SUBSCRIPTION_KEY' | ||
}); | ||
this.setState(() => ({ ready: true })); | ||
this.setState(() => ({ ponyfill })); | ||
} | ||
render() { | ||
const { | ||
state: { ponyfill } | ||
} = this; | ||
return ( | ||
this.state.ready && | ||
ponyfill && | ||
<Say | ||
speechSynthesis={ speechSynthesis } | ||
speechSynthesisUtterance={ SpeechSynthesisUtterance } | ||
speechSynthesis={ ponyfill.speechSynthesis } | ||
speechSynthesisUtterance={ ponyfill.SpeechSynthesisUtterance } | ||
text="Hello, World!" | ||
@@ -163,2 +250,6 @@ /> | ||
## Lexical and ITN support | ||
[Lexical and ITN support](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#response-parameters) is unique in Cognitive Services Speech Services. Our adapter added additional properties `transcriptITN`, `transcriptLexical`, and `transcriptMaskedITN` to surface the result, in addition to `transcript` and `confidence`. | ||
# Test matrix | ||
@@ -181,26 +272,15 @@ | ||
## To-do | ||
* Add `babel-runtime`, `microsoft-speech-browser-sdk`, and `simple-update-in` | ||
## Plan | ||
* General | ||
* [x] Unified [token exchange mechanism](packages/component/src/util/SubscriptionKey.js) | ||
* Speech recognition | ||
* [x] Add grammar list | ||
* [ ] Add tests for lifecycle events | ||
* [ ] Support `stop()` function | ||
* Currently, only `abort()` is supported | ||
* [x] Add tests for lifecycle events | ||
* [x] Support `stop()` and `abort()` function | ||
* [ ] Add grammar list | ||
* [ ] Investigate continuous mode | ||
* [ ] Enable Opus (OGG) encoding | ||
* [ ] Investigate support of Opus (OGG) encoding | ||
* Currently, there is a problem with `microsoft-speech-browser-sdk@0.0.12`, tracking on [this issue](https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript/issues/88) | ||
* [ ] Support custom speech | ||
* [ ] Support new [Speech-to-Text](https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/) service | ||
* Point to [new URIs](https://docs.microsoft.com/en-us/azure/cognitive-services/Speech-Service/rest-apis) | ||
* [ ] Support ITN, masked ITN, and lexical output | ||
* Speech synthesis | ||
* [ ] Event: add `pause`/`resume` support | ||
* [ ] Properties: add `paused`/`pending`/`speaking` support | ||
* [ ] Support new [Text-to-Speech](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-text-to-speech) service | ||
* Custom voice fonts | ||
* [ ] Support [custom voice fonts](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#text-to-speech-api) | ||
@@ -207,0 +287,0 @@ # Contributions |
Sorry, the diff of this file is not supported yet
Major refactor
Supply chain riskPackage has recently undergone a major refactor. It may be unstable or indicate significant internal changes. Use caution when updating to versions that include significant changes.
Found 1 instance in 1 package
No tests
QualityPackage does not have any tests. This is a strong signal of a poorly maintained or low quality package.
Found 1 instance in 1 package
1359057
868.68%82
141.18%14170
826.75%1
-50%283
39.41%5
150%