
Security News
AI Agent Lands PRs in Major OSS Projects, Targets Maintainers via Cold Outreach
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.
react-native-spokestack
Advanced tools
React Native plugin for adding voice using Spokestack. This includes speech recognition, wakeword, and natural language understanding, as well as synthesizing text to speech using Spokestack voices.
Using npm:
npm install --save react-native-spokestack
or using yarn:
yarn add react-native-spokestack
Then follow the instructions for each platform to link react-native-spokestack to your project:
Before running pod install, make sure to make the following edits.
react-native-spokestack makes use of relatively new APIs only available in iOS 13+. Make sure to set your deployment target to iOS 13 at the top of your Podfile:
Also set your deployment target to 13.0 in your XCode project.
platform :ios, '13.0'
We also need to use use_frameworks! in our Podfile in order to support dependencies written in Swift.
target 'SpokestackExample' do
use_frameworks!
#...
For the time being, use_frameworks! does not work with Flipper, so we also need to disable Flipper. Remove any Flipper-related lines in your Podfile. In React Native 0.63.2, they look like this:
# X Remove or comment out these lines X
use_flipper!
post_install do |installer|
flipper_post_install(installer)
end
# XX
Remove your existing Podfile.lock and Pods folder to ensure no conflicts, then install the pods:
$ npx pod-install
Add the following to your Info.plist to enable permissions. In XCode, also ensure your iOS deployment target is set to 13.0 or higher.
<key>NSMicrophoneUsageDescription</key>
<string>This app uses the microphone to hear voice commands</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition to process voice commands</string>
While Flipper works on fixing their pod for use_frameworks!, we must disable Flipper. We already removed the Flipper dependencies from Pods above, but there remains some code in the AppDelegate.m that imports Flipper. There are two ways to fix this.
-DFB_SONARKIT_ENABLED=1 from flags.In our example app, we've done option 1 and left in the Flipper code in case they get it working in the future and we can add it back.
#import <AVFoundation/AVFoundation.h>
Set the AudioSession category. There are several configurations that work.
The following is a suggestion that should fit most use cases:
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
AVAudioSession *session = [AVAudioSession sharedInstance];
[session setCategory:AVAudioSessionCategoryPlayAndRecord
mode:AVAudioSessionModeDefault
options:AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionAllowAirPlay | AVAudioSessionCategoryOptionAllowBluetoothA2DP | AVAudioSessionCategoryOptionAllowBluetooth
error:nil];
[session setActive:YES error:nil];
// ...
The example usage uses the system-provided ASRs (AndroidSpeechRecognizer and AppleSpeechRecognizer). However, AndroidSpeechRecognizer is not available on 100% of devices. If your app supports a device that doesn't have built-in speech recognition, use Spokestack ASR instead by setting the profile to a Spokestack profile using the profile prop.
See our ASR documentation for more information.
// ...
ext {
// Minimum SDK is 21
minSdkVersion = 21
// ...
dependencies {
// Minimium gradle is 3.0.1+
// The latest React Native already has this
classpath("com.android.tools.build:gradle:3.5.3")
Add the necessary permissions to your AndroidManifest.xml. The first permission is often there already. The second is needed for using the microphone.
<!-- For TTS -->
<uses-permission android:name="android.permission.INTERNET" />
<!-- For wakeword and ASR -->
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<!-- For ensuring no downloads happen over cellular, unless forced -->
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
The responsibility to request permission for RECORD_AUDIO on new devices is left to the user, as there are differing strategies for how to handle permissions.
While iOS will bring up permissions dialogs automatically for any permissions needed, you must do this manually in Android.
React Native already provides a module for this. See React Native's PermissionsAndroid for more info.
To include model files locally in your app (rather than downloading them from a CDN), you also need to add the necessary extensions so
the files can be included by Babel. To do this, edit your metro.config.js.
const defaults = require('metro-config/src/defaults/defaults')
module.exports = {
resolver: {
// json is already in the list
assetExts: defaults.assetExts.concat(['tflite', 'txt'])
}
}
Then include model files using source objects:
Spokestack.initialize(clientId, clientSecret, {
wakeword: {
filter: require('./filter.tflite'),
detect: require('./detect.tflite'),
encode: require('./encode.tflite')
},
nlu: {
model: require('./nlu.tflite'),
metadata: require('./metadata.json'),
vocab: require('./vocab.txt')
}
})
This is not required. Pass remote URLs to the same config options and the files will be downloaded and cached when first calling initialize.
See the contributing guide to learn how to contribute to the repository and the development workflow.
Get started using Spokestack, or check out our in-depth tutorials on ASR, NLU, and TTS. Also be sure to take a look at the Cookbook for quick solutions to common problems.
A working example app is included in this repo in the example/ folder.
import Spokestack from 'react-native-spokestack'
import { View, Button, Text } from 'react-native'
function App() {
const [listening, setListening] = useState(false)
const onActivate = () => setListening(true)
const onDeactivate = () => setListening(false)
const onRecognize = ({ transcript }) => console.log(transcript)
useEffect(() => {
Spokestack.addEventListener('activate', onActivate)
Spokestack.addEventListener('deactivate', onDeactivate)
Spokestack.addEventListener('recognize', onRecognize)
Spokestack.initialize(
process.env.SPOKESTACK_CLIENT_ID,
process.env.SPOKESTACK_CLIENT_SECRET
)
// This example starts the Spokestack pipeline immediately,
// but it could be delayed until after onboarding or other
// conditions have been met.
.then(Spokestack.start)
return () => {
Spokestack.removeAllListeners()
}
}, [])
return (
<View>
<Button onClick={() => Spokestack.activate()} title="Listen" />
<Text>{listening ? 'Listening...' : 'Idle'}</Text>
</View>
)
}
▸ initialize(clientId: string, clientSecret: string, config?: SpokestackConfig): Promise<void>
Defined in src/index.tsx:59
Initialize the speech pipeline; required for all other methods.
The first 2 args are your Spokestack credentials available for free from https://spokestack.io. Avoid hardcoding these in your app. There are several ways to include environment variables in your code.
Using process.env: https://babeljs.io/docs/en/babel-plugin-transform-inline-environment-variables/
Using a local .env file ignored by git: https://github.com/goatandsheep/react-native-dotenv https://github.com/luggit/react-native-config
See SpokestackConfig for all available options.
import Spokestack from 'react-native-spokestack'
// ...
await Spokestack.initialize(process.env.CLIENT_ID, process.env.CLIENT_SECRET, {
pipeline: {
profile: Spokestack.PipelineProfile.PTT_NATIVE_ASR
}
})
| Name | Type |
|---|---|
clientId | string |
clientSecret | string |
config? | SpokestackConfig |
Returns: Promise<void>
▸ start(): Promise<void>
Defined in src/index.tsx:77
Start the speech pipeline.
The speech pipeline starts in the deactivate state.
import Spokestack from 'react-native-spokestack`
// ...
Spokestack.initialize(process.env.CLIENT_ID, process.env.CLIENT_SECRET)
.then(Spokestack.start)
Returns: Promise<void>
▸ stop(): Promise<void>
Defined in src/index.tsx:90
Stop the speech pipeline. This effectively stops ASR, VAD, and wakeword.
import Spokestack from 'react-native-spokestack`
// ...
await Spokestack.stop()
Returns: Promise<void>
▸ activate(): Promise<void>
Defined in src/index.tsx:105
Manually activate the speech pipeline. This is necessary when using a PTT profile. VAD profiles can also activate ASR without the need to call this method.
import Spokestack from 'react-native-spokestack`
// ...
<Button title="Listen" onClick={() => Spokestack.activate()} />
Returns: Promise<void>
▸ deactivate(): Promise<void>
Defined in src/index.tsx:120
Deactivate the speech pipeline. If the profile includes wakeword, the pipeline will go back to listening for the wakeword. If VAD is active, the pipeline can reactivate without calling activate().
import Spokestack from 'react-native-spokestack`
// ...
<Button title="Stop listening" onClick={() => Spokestack.deactivate()} />
Returns: Promise<void>
▸ synthesize(input: string, format?: TTSFormat, voice?: string): Promise<string>
Defined in src/index.tsx:133
Synthesize some text into speech
Returns Promise<string> with the string
being the URL for a playable mpeg.
There is currently only one free voice available ("demo-male").
const url = await Spokestack.synthesize('Hello world')
play(url)
| Name | Type |
|---|---|
input | string |
format? | TTSFormat |
voice? | string |
Returns: Promise<string>
▸ speak(input: string, format?: TTSFormat, voice?: string): Promise<void>
Defined in src/index.tsx:148
Synthesize some text into speech and then immediately play the audio through the default audio system. Audio session handling can get very complex and we recommend using a RN library focused on audio for anything more than very simple playback.
There is currently only one free voice available ("demo-male").
await Spokestack.speak('Hello world')
| Name | Type |
|---|---|
input | string |
format? | TTSFormat |
voice? | string |
Returns: Promise<void>
▸ classify(utterance: string): Promise<SpokestackNLUResult>
Defined in src/index.tsx:163
Classify the utterance using the intent/slot Natural Language Understanding model passed to Spokestack.initialize(). See https://www.spokestack.io/docs/concepts/nlu for more info.
const result = await Spokestack.classify('hello')
// Here's what the result might look like,
// depending on the NLU model
console.log(result.intent) // launch
| Name | Type |
|---|---|
utterance | string |
Returns: Promise<SpokestackNLUResult>
• intent: string
Defined in src/types.ts:92
The intent based on the match provided by the NLU model
• slots: { type: string ; value: string }[]
Defined in src/types.ts:96
Data associated with the intent, provided by the NLU model
• confidence: number
Defined in src/types.ts:94
A percentage of the confidence of the match, given by the tensorflow
• addEventListener: typeof addListener
Defined in src/index.tsx:203
Bind to any event emitted by the native libraries The events are: "recognize", "partial_recognize", "error", "activate", "deactivate", and "timeout". See the bottom of the README.md for descriptions of the events.
useEffect(() => {
const listener = Spokestack.addEventListener('recognize', onRecognize)
// Unsubsribe by calling remove when components are unmounted
return () => {
listener.remove()
}
}, [])
• removeEventListener: typeof removeListener
Defined in src/index.tsx:211
Remove an event listener
Spokestack.removeEventListener('recognize', onRecognize)
• removeAllListeners: () => void
Defined in src/index.tsx:221
Remove any existing listeners
componentWillUnmount() {
Spokestack.removeAllListeners()
}
• SPEECHMARKDOWN: = 2
Defined in src/types.ts:65
• SSML: = 1
Defined in src/types.ts:64
• TEXT: = 0
Defined in src/types.ts:63
Use addEventListener(), removeEventListener(), and removeAllListeners() to add and remove events handlers. All events are available in both iOS and Android.
| Name | Data | Description |
|---|---|---|
| recognize | { transcript: string } | Fired whenever speech recognition completes successfully. |
| partial_recognize | { transcript: string } | Fired whenever the transcript changes during speech recognition. |
| timeout | null | Fired when an active pipeline times out due to lack of recognition. |
| activate | null | Fired when the speech pipeline activates, either through the VAD or manually. |
| deactivate | null | Fired when the speech pipeline deactivates. |
| play | { playing: boolean } | Fired when TTS playback starts and stops. See the speak() function. |
| error | { error: string } | Fired when there's an error in Spokestack. |
When an error event is triggered, any existing promises are rejected as it's difficult to know exactly from where the error originated and whether it may affect other requests.
These are the configuration options that can be passed to Spokestack.initialize(_, _, spokestackConfig). No options in SpokestackConfig are required.
SpokestackConfig has the following structure:
interface SpokestackConfig {
/**
* This option is only used when remote URLs are passed to fields such as `wakeword.filter`.
*
* Set this to true to allow downloading models over cellular.
* Note that `Spokestack.initialize()` will still reject the promise if
* models need to be downloaded but there is no network at all.
*
* Ideally, the app will include network handling itself and
* inform the user about file downloads.
*
* Default: false
*/
allowCellularDownloads?: boolean
/**
* Wakeword and NLU model files are cached internally.
* Set this to true whenever a model is changed
* during development to refresh the internal model cache.
*
* This affects models passed with `require()` as well
* as models downloaded from remote URLs.
*
* Default: false
*/
refreshModels?: boolean
/**
* This controls the log level for the underlying native
* iOS and Android libraries.
* See the TraceLevel enum for values.
*/
traceLevel?: TraceLevel
/**
* Most of these options are advanced aside from "profile"
*/
pipeline?: PipelineConfig
/** Only needed if using Spokestack.classify */
nlu?: NLUConfig
/**
* Only required for wakeword
* Most options are advanced aside from
* filter, encode, and decode for specifying config files.
*/
wakeword?: WakewordConfig
}
• DEBUG: = 10
Defined in src/types.ts:50
• INFO: = 30
Defined in src/types.ts:52
• NONE: = 100
Defined in src/types.ts:53
• PERF: = 20
Defined in src/types.ts:51
• Optional profile: PipelineProfile
Defined in src/types.ts:109
Profiles are collections of common configurations for Pipeline stages.
If Wakeword config files are specified, the default will be
TFLITE_WAKEWORD_NATIVE_ASR.
Otherwise, the default is PTT_NATIVE_ASR.
• PTT_NATIVE_ASR: = 2
Defined in src/types.ts:24
Apple/Android Automatic Speech Recogntion is on when the speech pipeline is active. This is likely the more common profile when not using wakeword.
• PTT_SPOKESTACK_ASR: = 5
Defined in src/types.ts:42
Spokestack Automatic Speech Recogntion is on when the speech pipeline is active. This is likely the more common profile when not using wakeword, but Spokestack ASR is preferred.
• TFLITE_WAKEWORD_NATIVE_ASR: = 0
Defined in src/types.ts:12
Set up wakeword and use local Apple/Android ASR. Note that wakeword.filter, wakeword.encode, and wakeword.detect are required if any wakeword profile is used.
• TFLITE_WAKEWORD_SPOKESTACK_ASR: = 3
Defined in src/types.ts:30
Set up wakeword and use remote Spokestack ASR. Note that wakeword.filter, wakeword.encode, and wakeword.detect are required if any wakeword profile is used.
• VAD_NATIVE_ASR: = 1
Defined in src/types.ts:17
Apple/Android Automatic Speech Recognition is on when Voice Active Detection triggers it.
• VAD_SPOKESTACK_ASR: = 4
Defined in src/types.ts:35
Spokestack Automatic Speech Recognition is on when Voice Active Detection triggers it.
• Optional sampleRate: number
Defined in src/types.ts:113
Audio sampling rate, in Hz
• Optional frameWidth: number
Defined in src/types.ts:119
advanced
Speech frame width, in ms
• Optional bufferWidth: number
Defined in src/types.ts:125
advanced
Buffer width, used with frameWidth to determine the buffer size
• Optional vadMode: "quality" | "low-bitrate" | "aggressive" | "very-aggressive"
Defined in src/types.ts:129
Voice activity detector mode
• Optional vadFallDelay: number
Defined in src/types.ts:136
advanced
Falling-edge detection run length, in ms; this value determines how many negative samples must be received to flip the detector to negative
• Optional vadRiseDelay: number
Defined in src/types.ts:145
advanced
Android-only
Rising-edge detection run length, in ms; this value determines how many positive samples must be received to flip the detector to positive
• Optional ansPolicy: "mild" | "medium" | "aggressive" | "very-aggressive"
Defined in src/types.ts:153
advanced
Android-only for AcousticNoiseSuppressor
Noise policy
• Optional agcCompressionGainDb: number
Defined in src/types.ts:162
advanced
Android-only for AcousticGainControl
Target peak audio level, in -dB, to maintain a peak of -9dB, configure a value of 9
• Optional agcTargetLevelDbfs: number
Defined in src/types.ts:170
advanced
Android-only for AcousticGainControl
Dynamic range compression rate, in dBFS
• model: string | RequireSource
Defined in src/types.ts:181
The NLU Tensorflow-Lite model. If specified, metadata and vocab are also required.
This field accepts 2 types of values.
require or import (e.g. model: require('./nlu.tflite'))• metadata: string | RequireSource
Defined in src/types.ts:189
The JSON file for NLU metadata. If specified, model and vocab are also required.
This field accepts 2 types of values.
require or import (e.g. metadata: require('./metadata.json'))• vocab: string | RequireSource
Defined in src/types.ts:197
A txt file containing the NLU vocabulary. If specified, model and metadata are also required.
This field accepts 2 types of values.
require or import (e.g. vocab: require('./vocab.txt'))• Optional inputLength: number
Defined in src/types.ts:207
• filter: string | RequireSource
Defined in src/types.ts:221
The "filter" Tensorflow-Lite model. If specified, detect and encode are also required.
This field accepts 2 types of values.
require or import (e.g. filter: require('./filter.tflite'))The filter model is used to calculate a mel spectrogram frame from the linear STFT; its inputs should be shaped [fft-width], and its outputs [mel-width]
• detect: string | RequireSource
Defined in src/types.ts:233
The "detect" Tensorflow-Lite model. If specified, filter and encode are also required.
This field accepts 2 types of values.
require or import (e.g. detect: require('./detect.tflite'))The encode model is used to perform each autoregressive step over the mel frames; its inputs should be shaped [mel-length, mel-width], and its outputs [encode-width], with an additional state input/output shaped [state-width]
• encode: string | RequireSource
Defined in src/types.ts:244
The "encode" Tensorflow-Lite model. If specified, filter and detect are also required.
This field accepts 2 types of values.
require or import (e.g. encode: require('./encode.tflite'))Its inputs should be shaped [encode-length, encode-width], and its outputs
• Optional activeMax: number
Defined in src/types.ts:254
The maximum length of an activation, in milliseconds, used to time out the activation
• Optional activeMin: number
Defined in src/types.ts:249
The minimum length of an activation, in milliseconds, used to ignore a VAD deactivation after the wakeword
• Optional encodeLength: number
Defined in src/types.ts:282
advanced
The length of the sliding window of encoder output used as an input to the classifier, in milliseconds
• Optional encodeWidth: number
Defined in src/types.ts:288
advanced
The size of the encoder output, in vector units
• Optional fftHopLength: number
Defined in src/types.ts:332
advanced
The length of time to skip each time the overlapping STFT is calculated, in milliseconds
• Optional fftWindowSize: number
Defined in src/types.ts:316
advanced
The size of the signal window used to calculate the STFT, in number of samples - should be a power of 2 for maximum efficiency
• Optional fftWindowType: string
Defined in src/types.ts:325
advanced
Android-only
The name of the windowing function to apply to each audio frame before calculating the STFT; currently the "hann" window is supported
• Optional melFrameLength: number
Defined in src/types.ts:346
advanced
The length of time to skip each time the overlapping STFT is calculated, in milliseconds
• Optional melFrameWidth: number
Defined in src/types.ts:353
advanced
The size of each mel spectrogram frame, in number of filterbank components
• Optional preEmphasis: number
Defined in src/types.ts:339
advanced
The pre-emphasis filter weight to apply to the normalized audio signal (0 for no pre-emphasis)
• Optional requestTimeout: number
Defined in src/types.ts:268
iOS-only
Length of time to allow an Apple ASR request to run, in milliseconds. Apple has an undocumented limit of 60000ms per request.
• Optional rmsAlpha: number
Defined in src/types.ts:309
advanced
The Exponentially-Weighted Moving Average (EWMA) update rate for the current RMS signal energy (0 for no RMS normalization)
• Optional rmsTarget: number
Defined in src/types.ts:302
advanced
The desired linear Root Mean Squared (RMS) signal energy, which is used for signal normalization and should be tuned to the RMS target used during training
• Optional stateWidth: number
Defined in src/types.ts:294
advanced
The size of the encoder state, in vector units (defaults to wake-encode-width)
• Optional threshold: number
Defined in src/types.ts:275
advanced
The threshold of the classifier's posterior output, above which the trigger activates the pipeline, in the range [0, 1]
• Optional wakewords: string
Defined in src/types.ts:261
iOS-only
A comma-separated list of wakeword keywords Only necessary when not passing the filter, detect, and encode paths.
Apache-2.0
Copyright 2021 Spokestack
FAQs
React Native plugin for adding voice using Spokestack
The npm package react-native-spokestack receives a total of 99 weekly downloads. As such, react-native-spokestack popularity was classified as not popular.
We found that react-native-spokestack demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.

Research
/Security News
Chrome extension CL Suite by @CLMasters neutralizes 2FA for Facebook and Meta Business accounts while exfiltrating Business Manager contact and analytics data.

Security News
After Matplotlib rejected an AI-written PR, the agent fired back with a blog post, igniting debate over AI contributions and maintainer burden.