Latest Threat Research:SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains.Details →

Book a Demo Install Sign in

spokestack

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

spokestack

Tools for integration with the Spokestack API in Node.js

Source

npm

Version: 3.2.0

Version published: 5 years ago

Maintainers: 1

Created: 6 years ago

Source

node-spokestack

A set of tools for integration with the Spokestack API in Node.js

Installation

$ npm install spokestack --save

Features

Spokestack has all the tools you need to build amazing user experiences for speech. Here are some of the features included in node-spokestack:

Automatic Speech Recognition (ASR): We provide multiple ways to hook up either Spokestack ASR or Google Cloud Speech to your node/express server, including asr functions for one-off ASR requests and websocket server integrations for ASR streaming. Or, use the ASR services directly for more advanced integrations.
Speech-to-Text: Through the use of our GraphQL API (see below), Spokestack offers multiple ways to generate voice audio from text. Send raw text, speech markdown, or SSML and get back a URL for audio to play in the browser.
Wake word and Keyword: Wake word and keyword processing are supported through the use of our speech pipeline (see startPipeline). One of the most powerful features we provide is the ability to define and train custom wake word and keyword models directly on spokestack.io. When training is finished, we host the model files for you on a CDN. Pass the CDN URLs to startPipeline() and the Speech Pipeline will start listening These same models can be used in spokestack-python, spokestack-android, spokestack-ios, and react-native-spokestack. The pipeline uses a web worker in the browser to keep all of the speech processing off the main thread so your UI never gets blocked. NOTE: The speech pipeline (specifically tensorflow's webgl backend) currently only works in Blink browsers (Chrome, Edge, Opera, Vivaldi, Brave, and most Android browsers) as it requires the use of the experimental OffscreenCanvas API. Firefox is close to full support for that API, and we'll look into supporting Firefox when that's available.
Natural Language Understanding (NLU): The GraphQL API (see below) also provides a way to convert the text from ASR to actionable "intents", or functions that apps can understand. For instance, if a user says, "Find a recipe for chocolate cake", an NLU might return a "SEARCH_RECIPE" intent. To use the NLU, you'll need an NLU model. While we have plans to release an NLU editor, the best way right now to create an NLU model is to use Alexa, DialogFlow, or Jovo and upload the exported model to your Spokestack account. We support exports from all of those platforms.

This repo includes an example app that demonstrates ASR, speech-to-text, and wake word and keyword processing. It also includes a route for viewing live docs (or "introspection") of the Spokestack API (/graphql).

The GraphQL API

Speech-to-text and NLU are available through Spokestack's GraphQL API, which is available at https://api.spokestack.io/v1. It requires Spokestack credentials to access (creating an account is quick and free).

To use the GraphQL API, node-spokestack includes Express middleware to help integrate a proxy into any node/express server. A proxy is necessary to avoid exposing your Spokestack credentials.

The API is used to synthesize text-to-speech using various methods including raw text, speech markdown, and SSML.

It can also be used for NLU classification.

Spokestack GraphQL Introspection

Automatic Speech Recognition (ASR)

ASR is accomplished through the use of a websocket (rather than GraphQL). node-spokestack includes functions to use either Spokestack ASR or Google Cloud Speech, and there are two functions for each platform.

A helper function for adding a websocket to a node server (express or otherwise). This is the main way to use ASR.
A function for processing speech into text in one-off requests. This is useful if you have all of the speech up-front.

Using Google ASR instead of Spokestack ASR

If you'd prefer to use Google ASR, follow these instructions for setting up Google Cloud Speech. Ensure GOOGLE_APPLICATION_CREDENTIALS is set in your environment, and then use the googleASR and googleASRSocketServer functions instead of their Spokestack equivalents.

Wake Word and Keyword (Speech Pipeline)

The speech pipeline uses a custom build of Tensorflow JS in a Web Worker to process speech. It notifies the user when something matches the specified wake word or keyword models. The main function for this is the startPipeline() function. To use startPipeline(), you'll need to serve the web worker and tensorflow from your node/express server. Our example next.js app demonstrates how you might accomplish this in express:

app.use(
  '/spokestack-web-worker.js',
  express.static('./node_modules/spokestack/dist/spokestack-web-worker.min.js')
)

With these made available to your front-end, the speech pipeline can be started.

Another option is to copy the file from node_modules to your static/public folder during your build process.

// In package.json
"scripts": {
  // ...
  "copy:spokestack": "cp node_modules/spokestack/dist/spokestac-web-worker.min.js public/spokestack-web-worker.js",
  "build": "npm run copy:spokestack && next build"
}

Setup

Go to spokestack.io and create an account. Create a token at spokestack.io/account/settings#api. Note that you'll only be able to see the token secret once. If you accidentally leave the page, create another token. Once you have a token, set the following environment variables in your .bash_profile or .zshenv:

export SS_API_CLIENT_ID=#"Identity" field from Spokestack API token
export SS_API_CLIENT_SECRET=#"Secret key" field from Spokestack API token

Convenience functions for Node.js servers

spokestackMiddleware

▸ spokestackMiddleware(): function

Express middleware for adding a proxy to the Spokestack GraphQL API. A proxy is necessary to avoid exposing your Spokestack token secret on the client. Once a graphql route is in place, your client can use that with GraphQL.

import { spokestackMiddleware } from 'spokestack'
import bodyParser from 'body-parser'
import express from 'express'

const expressApp = express()

expressApp.post('/graphql', bodyParser.json(), spokestackMiddleware())

This is also convenient for setting up graphiql introspection. An example fetcher for graphiql on the client (browser only) might look like this:

const graphQLFetcher = (graphQLParams) =>
  fetch('/graphql', {
    method: 'post',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(graphQLParams)
  })
    .then((response) => response.json())
    .catch((response) => response.text())

Returns: (req: Request, res: Response) => void

Defined in: server/expressMiddleware.ts:37

asrSocketServer

▸ asrSocketServer(serverConfig: WebSocket.ServerOptions, asrConfig?: Omit<SpokestackASRConfig, sampleRate>): void

Adds a web socket server to the given HTTP server to stream ASR using Spokestack ASR. This uses the "ws" node package for the socket server.

import { createServer } from 'http'
const port = parseInt(process.env.PORT || '3000', 10)
const server = createServer() // or express()
// Attach the websocket server to the HTTP server
asrSocketServer({ server })
server.listen(port, () => {
  console.log(`Listening at http://localhost:${port}`)
})

Parameters:

Name	Type
`serverConfig`	WebSocket.ServerOptions
`asrConfig`	Omit<SpokestackASRConfig, sampleRate>

Returns: void

Defined in: server/socketServer.ts:23

SpokestackASRConfig

format

• Optional format: LINEAR16

Defined in: server/spokestackASRService.ts:9

language

• Optional language: en

Defined in: server/spokestackASRService.ts:10

limit

• Optional limit: number

Defined in: server/spokestackASRService.ts:11

sampleRate

• sampleRate: number

Defined in: server/spokestackASRService.ts:12

spokestackUrl

• Optional spokestackUrl: string

Set a different location for the Spokestack socket URL. This is very rarely needed. Default: 'wss:api.spokestack.io/v1/asr/websocket'

Defined in: server/spokestackASRService.ts:27

timeout

• Optional timeout: number

Reset speech recognition and clear the transcript every timeout milliseconds. When no new data comes in for the given timeout, the auth message is sent again to begin a new ASR transcation. Set to 0 to disable. Default: 3000

Defined in: server/spokestackASRService.ts:21

asr

▸ asr(content: string | Uint8Array, sampleRate: number): Promise<string | null>

A one-off method for processing speech to text using Spokestack ASR.

import fileUpload from 'express-fileupload'
import { asr } from 'spokestack'
import express from 'express'

const expressApp = express()

expressApp.post('/asr', fileUpload(), (req, res) => {
  const sampleRate = Number(req.body.sampleRate)
  const audio = req.files.audio
  if (isNaN(sampleRate)) {
    res.status(400)
    res.send('Parameter required: "sampleRate"')
    return
  }
  if (!audio) {
    res.status(400)
    res.send('Parameter required: "audio"')
    return
  }
  asr(Buffer.from(audio.data.buffer), sampleRate)
    .then((text) => {
      res.status(200)
      res.json({ text })
    })
    .catch((error: Error) => {
      console.error(error)
      res.status(500)
      res.send('Unknown error during speech recognition. Check server logs.')
    })
})

Parameters:

Name	Type
`content`	string \| Uint8Array
`sampleRate`	number

Returns: Promise<string | null>

Defined in: server/asr.ts:43

googleASRSocketServer

▸ googleASRSocketServer(serverConfig: WebSocket.ServerOptions): void

Adds a web socket server to the given HTTP server to stream ASR using Google Speech. This uses the "ws" node package for the socket server.

import { createServer } from 'http'
const port = parseInt(process.env.PORT || '3000', 10)
const server = createServer() // or express()
// Attach the websocket server to the HTTP server
googleASRSocketServer({ server })
server.listen(port, () => {
  console.log(`Listening at http://localhost:${port}`)
})

Parameters:

Name	Type
`serverConfig`	WebSocket.ServerOptions

Returns: void

Defined in: server/socketServer.ts:108

googleASR

▸ googleASR(content: string | Uint8Array, sampleRate: number): Promise<string | null>

A one-off method for processing speech to text using Google Speech.

import fileUpload from 'express-fileupload'
import { googleASR } from 'spokestack'
import express from 'express'

const expressApp = express()

expressApp.post('/asr', fileUpload(), (req, res) => {
  const sampleRate = Number(req.body.sampleRate)
  const audio = req.files.audio
  if (isNaN(sampleRate)) {
    res.status(400)
    res.send('Parameter required: "sampleRate"')
    return
  }
  if (!audio) {
    res.status(400)
    res.send('Parameter required: "audio"')
    return
  }
  googleASR(Buffer.from(audio.data.buffer), sampleRate)
    .then((text) => {
      res.status(200)
      res.json({ text })
    })
    .catch((error: Error) => {
      console.error(error)
      res.status(500)
      res.send('Unknown error during speech recognition. Check server logs.')
    })
})

Parameters:

Name	Type
`content`	string \| Uint8Array
`sampleRate`	number

Returns: Promise<string | null>

Defined in: server/asr.ts:108

spokestackASRService

▸ spokestackASRService(config: SpokestackASRConfig, onData: (response: SpokestackResponse) => void): Promise<WebSocket>

A low-level utility for working with the Spokestack ASR service directly. This should not be used most of the time. It is only for custom, advanced integrations. See asr for one-off ASR and asrSocketServer for ASR streaming using a websocket server that can be added to any node server.

Parameters:

Name	Type
`config`	SpokestackASRConfig
`onData`	(`response`: SpokestackResponse) => void

Returns: Promise<WebSocket>

Defined in: server/spokestackASRService.ts:74

SpokestackResponse

error

• Optional error: string

When the status is "error", the error message is available here.

Defined in: server/spokestackASRService.ts:48

final

• final: boolean

The final key is used to indicate that the highest confidence transcript for the utterance is sent. However, this will only be set to true after signaling to Spokestack ASR that no more audio data is incoming. Signal this by sending an empty buffer (e.g. socket.send(Buffer.from(''))). See the source for asr for an example.

Defined in: server/spokestackASRService.ts:57

hypotheses

• hypotheses: ASRHypothesis[]

This is a list of transcripts, each associated with their own confidence level from 0 to 1. It is an array to allow for the possibility of multiple transcripts in the API, but is almost always a list of one.

Defined in: server/spokestackASRService.ts:64

status

• status: ok | error

Defined in: server/spokestackASRService.ts:46

ASRHypothesis

confidence

• confidence: number

A number between 0 and 1 to indicate the tensorflow confidence level for the given transcript.

Defined in: server/spokestackASRService.ts:41

transcript

• transcript: string

Defined in: server/spokestackASRService.ts:42

ASRFormat

• LINEAR16: = "PCM16LE"

Defined in: server/spokestackASRService.ts:5

encryptSecret

▸ encryptSecret(body: string): string

This is a convenience method for properly authorizing requests to the Spokestack graphql API.

Note: Do not to expose your key's secret on the client. This should only be done on the server.

See server/expressMiddleware.ts for example usage.

Parameters:

Name	Type
`body`	string

Returns: string

Defined in: server/encryptSecret.ts:13

Convenience functions for the client

These functions are available exports from spokestack/client.

record

▸ record(config?: RecordConfig): Promise<AudioBuffer>

A method to record audio for a given number of seconds

import { record } from 'spokestack/client'

// Record for 3 seconds and return an AudioBuffer
const buffer = await record()

// Record for 5 seconds, calling onProgress every second
const buffer = await record({
  time: 5,
  onProgress: (remaining) => {
    console.log(`Recording..${remaining}`)
  }
})

// Record for 3 seconds, calling onStart when recording starts
// Note: recording stops when the Promise resolves
const buffer = await record({
  time: 5,
  onStart: () => {
    console.log('Recording started')
  }
})

Then create a file for uploading See googleASR for an example on how to process the resulting audio file

import { convertFloat32ToInt16 } from 'spokestack/client'

const sampleRate = buffer.sampleRate
const file = new File(
  // Convert to LINEAR16 on the front-end instead of the server.
  // This took <10ms in our testing even on a slow phone.
  // It cuts the data over the wire to 1/4 the size.
  [convertFloat32ToInt16(buffer.getChannelData(0))],
  'recording.raw'
)

The file can then be uploaded using FormData:

const formData = new FormData()
formData.append('sampleRate', sampleRate + '')
formData.append('audio', file)
fetch('/asr', {
  method: 'POST',
  body: formData,
  headers: { Accept: 'application/json' }
})
  .then((res) => {
    if (!res.ok) {
      console.log(`Response status: ${res.status}`)
    }
    return res.json()
  })
  .then(({ text }) => console.log('Processed speech', text))
  .catch(console.error.bind(console))

Parameters:

Name	Type
`config`	RecordConfig

Returns: Promise<AudioBuffer>

Defined in: client/record.ts:84

RecordConfig

onProgress

• Optional onProgress: (remaining: number) => void

A callback function to be called each second of recording.

Parameters:

Name	Type
`remaining`	number

Returns: void

Defined in: client/record.ts:16

onStart

• Optional onStart: () => void

A callback function to be called when recording starts

Returns: void

Defined in: client/record.ts:14

time

• Optional time: number

The total time to record. Default: 3

Defined in: client/record.ts:12

startStream

▸ startStream(__namedParameters: StartStreamOptions): Promise<WebSocket, [ProcessorReturnValue]>

Returns a function to start recording using a native WebSocket. This assumes the socket is hosted on the current server.

import { startStream } from 'spokestack/client'

// ...
try {
  const [ws] = await startStream({
    isPlaying: () => this.isPlaying
  })
  ws.addEventListener('open', () => console.log('Recording started'))
  ws.addEventListener('close', () => console.log('Recording stopped'))
  ws.addEventListener('message', (e) => console.log('Speech processed: ', e.data))
} catch (e) {
  console.error(e)
}

Parameters:

Name	Type
`__namedParameters`	StartStreamOptions

Returns: Promise<WebSocket, [ProcessorReturnValue]>

Defined in: client/recordStream.ts:43

stopStream

▸ stopStream(): void

Stop the current recording stream if one exists.

import { stopStream } from 'spokestack/client'
stopStream()

Returns: void

Defined in: client/recordStream.ts:96

convertFloat32ToInt16

▸ convertFloat32ToInt16(fp32Samples: Float32Array): Int16Array

A utility method to convert Float32Array audio to an Int16Array to be passed directly to Speech APIs such as Google Speech

import { convertFloat32ToInt16, record } from 'spokestack/client'

const buffer = await record()
const file = new File([convertFloat32ToInt16(buffer.getChannelData(0))], 'recording.raw')

Parameters:

Name	Type
`fp32Samples`	Float32Array

Returns: Int16Array

Defined in: client/convertFloat32ToInt16.ts:16

startPipeline

▸ startPipeline(config: PipelineConfig): Promise<SpeechPipeline>

Create and immediately start a SpeechPipeline to process user speech using the specified configuration.

To simplify configuration, preset pipeline profiles are provided and can be passed in the config object's profile key. See PipelineProfile for more details.

NOTE: The speech pipeline (specifically tensorflow's webgl backend) currently only works in Blink browsers (Chrome, Edge, Opera, Vivaldi, Brave, and most Android browsers) as it requires the use of the experimental OffscreenCanvas API.

First make sure to serve the web worker and tensorflow.js from your node server at the expected locations.

For example, with express:

app.use(
  '/spokestack-web-worker.js',
  express.static(`./node_modules/spokestack/dist/spokestack-web-worker.min.js`)
)

// Starts a speech pipeline for wakeword processing.
try {
  await startPipeline({
    profile: PipelineProfile.Wakeword,
    baseUrls: { wakeword: 'https://s.spokestack.io/u/hgmYb/js' },
    onEvent: (event) => {
      switch (event.type) {
        case SpeechEventType.Activate:
          this.setState({ wakeword: { error: '', result: true } })
          break
        case SpeechEventType.Timeout:
          this.setState({ wakeword: { error: 'timeout' } })
          break
        case SpeechEventType.Error:
          console.error(event.error)
          break
      }
    }
  })
} catch (e) {
  console.error(e)
}

Parameters:

Name	Type
`config`	PipelineConfig

Returns: Promise<SpeechPipeline>

Defined in: client/pipeline.ts:161

SpeechPipeline

Spokestack's speech pipeline comprises a voice activity detection (VAD) component and a series of stages that manage voice interaction.

Audio is processed off the main thread, currently via a ScriptProcessorNode and web worker. Each chunk of audio samples is passed to the worker along with an indication of speech activity, and each of the stages processes it in order to, e.g., detect whether the user said a wakeword or transcribe an occurrence of a keyword. See documentation for the individual stages for more information on their purpose.

+ new SpeechPipeline(config: SpeechPipelineConfig): SpeechPipeline

Create a new speech pipeline.

Parameters:

Name	Type	Description
`config`	SpeechPipelineConfig	A SpeechPipelineConfig object describing basic pipeline configuration as well as options specific to certain stages (URLs to models, classes for keyword models, etc.).

Returns: SpeechPipeline

Defined in: client/SpeechPipeline.ts:40

Methods

▸ start(): Promise<SpeechPipeline>

Start processing audio with the pipeline. If this is the first use of the pipeline, the microphone permission will be requested from the user if they have not already granted it.

Returns: Promise<SpeechPipeline>

Defined in: client/SpeechPipeline.ts:85

▸ stop(): void

Stop the pipeline, destroying the internal audio processors and relinquishing the microphone.

Returns: void

Defined in: client/SpeechPipeline.ts:206

SpeechPipelineConfig

onEvent

• Optional onEvent: PipelineEventHandler

Defined in: client/SpeechPipeline.ts:19

speechConfig

• speechConfig: SpeechConfig

Defined in: client/SpeechPipeline.ts:16

stages

• stages: Stage[]

Defined in: client/SpeechPipeline.ts:17

workerUrl

• Optional workerUrl: string

Defined in: client/SpeechPipeline.ts:18

PipelineProfile

Preset profiles for use with startPipeline that include both default configuration and lists of processing stages. Individual stages may require additional configuration that cannot be provided automatically, so see each stage for more details. The stages used by each profile are as follows:

Keyword: VadTrigger and KeywordRecognizer: actively listens for any user speech and delivers a transcript if a keyword is recognized.
Wakeword: WakewordTrigger: listens passively until a wakeword is recognized, then activates the pipeline so that ASR can be performed.

• Keyword: = "KEYWORD"

A profile that activates on voice activity and transcribes speech using pretrained keyword recognizer models that support a limited vocabulary.

Defined in: client/pipeline.ts:30

• Wakeword: = "WAKEWORD"

A profile that sends an Activate event when a wakeword is detected by a set of pretrained wakeword models. Once that event is received, subsequent audio should be sent to a speech recognizer for transcription.

Defined in: client/pipeline.ts:36

SpeechEventType

• Activate: = "ACTIVATE"

Defined in: client/types.ts:83

• Deactivate: = "DEACTIVATE"

Defined in: client/types.ts:84

• Error: = "ERROR"

Defined in: client/types.ts:87

• Recognize: = "RECOGNIZE"

Defined in: client/types.ts:86

• Timeout: = "TIMEOUT"

Defined in: client/types.ts:85

Stage

• KeywordRecognizer: = "keyword"

Defined in: client/types.ts:100

• VadTrigger: = "vadTrigger"

Defined in: client/types.ts:98

• WakewordTrigger: = "wakeword"

Defined in: client/types.ts:99

stopPipeline

▸ stopPipeline(): void

Stop the speech pipeline and relinquish its resources, including the microphone.

stopPipeline()

Returns: void

Defined in: client/pipeline.ts:195

countdown

▸ countdown(time: number, progress: (remaining: number) => void, complete: () => void): void

Countdown a number of seconds. This is used by record() to record a certain number of seconds.

Parameters:

Name	Type	Description
`time`	number	Number of seconds
`progress`	(`remaining`: number) => void	Callback for each second (includes first second)
`complete`	() => void	Callback for completion

Returns: void

Defined in: client/countdown.ts:8

Low-level processor functions

These are low-level functions if you need to work with your own audio processors, available from spokestack/client.

startProcessor

▸ startProcessor(): Promise<Error] | [null, [ProcessorReturnValue]>

Underlying utility method for recording audio, used by the record and recordStream methods.

While createScriptProcessor is deprecated, the replacement (AudioWorklet) does not yet have broad support (currently only supported in Blink browsers). See https://caniuse.com/#feat=mdn-api_audioworkletnode

We'll switch to AudioWorklet when it does.

Returns: Promise<Error] | [null, [ProcessorReturnValue]>

Defined in: client/processor.ts:22

ProcessorReturnValue

context

• context: AudioContext

Defined in: client/processor.ts:8

processor

• processor: ScriptProcessorNode

Defined in: client/processor.ts:9

stopProcessor

▸ stopProcessor(): void

Underlying utility method to stop the current processor if it exists and disconnect the microphone.

Returns: void

Defined in: client/processor.ts:53

concatenateAudioBuffers

▸ concatenateAudioBuffers(buffer1: AudioBuffer | null, buffer2: AudioBuffer | null, context: AudioContext): null | AudioBuffer

A utility method to concatenate two AudioBuffers

Parameters:

Name	Type
`buffer1`	AudioBuffer \| null
`buffer2`	AudioBuffer \| null
`context`	AudioContext

Returns: null | AudioBuffer

Defined in: client/concatenateAudioBuffers.ts:4

Keywords

spokestack

voice

node

FAQs

What is spokestack?

Is spokestack well maintained?

Package last updated on 26 Apr 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

spokestack

Installation

Features

The GraphQL API

Automatic Speech Recognition (ASR)

Using Google ASR instead of Spokestack ASR

Wake Word and Keyword (Speech Pipeline)

Setup

Convenience functions for Node.js servers

spokestackMiddleware

asrSocketServer

Parameters:

SpokestackASRConfig

format

language

limit

sampleRate

spokestackUrl

timeout

asr

Parameters:

googleASRSocketServer

Parameters:

googleASR

Parameters:

spokestackASRService

Parameters:

SpokestackResponse

error

final

hypotheses

status

ASRHypothesis

confidence

transcript

ASRFormat

encryptSecret

Parameters:

Convenience functions for the client

record

Parameters:

RecordConfig

onProgress

Parameters:

onStart

time

startStream

Parameters:

stopStream

convertFloat32ToInt16

Parameters:

startPipeline

Parameters:

SpeechPipeline

Parameters:

Methods

SpeechPipelineConfig

onEvent

speechConfig

stages

workerUrl

PipelineProfile

SpeechEventType

Stage

stopPipeline

countdown

Parameters:

Low-level processor functions

startProcessor

ProcessorReturnValue

context

processor

stopProcessor

concatenateAudioBuffers

Parameters:

Keywords

Related posts

SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains

Socket Joins the OpenJS Foundation