@aws-sdk/client-transcribe-streaming
Introduction
Amazon Transcribe streaming enables you to send an audio stream and receive back a stream of text in real time.
The API makes it easy for developers to add real-time speech-to-text capability to their applications. It can be used
for a variety of purposes. For example:
- Streaming transcriptions can generate real-time subtitles for live broadcast media.
- Lawyers can make real-time annotations on top of streaming transcriptions during courtroom depositions.
- Video game chat can be transcribed in real time so that hosts can moderate content or run real-time analysis.
- Streaming transcriptions can provide assistance to the hearing impaired.
The JavaScript SDK Transcribe Streaming client encapsulates the API into a JavaScript library that can be run on
browsers, Node.js and potentially React Native. By default, the client uses HTTP/2 connection on Node.js, and uses
WebSocket
connection on browsers and React Native.
Installing
To install the this package, simply type add or install @aws-sdk/client-transcribe-streaming
using your favorite package manager:
npm install @aws-sdk/client-transcribe-streaming
yarn add @aws-sdk/client-transcribe-streaming
pnpm add @aws-sdk/client-transcribe-streaming
Getting Started
In the sections bellow, we will explain the library by an example of using startStreamTranscription
method to
transcribe English speech to text.
If you haven't, please read the root README for guidance for creating a sample application and
installation. After installation, in the index.js
, you can import the Transcribe Streaming client like:
const { TranscribeStreamingClient, StartStreamTranscriptionCommand } = require("@aws-sdk/client-transcribe-streaming");
If require
is not available on the platform you are working on(browsers). You can import the client like:
import {
TranscribeStreamingClient,
StartMedicalStreamTranscriptionCommand,
} from "@aws-sdk/client-transcribe-streaming";
Constructing the Service Client
You can create a service client like bellow:
const client = new TranscribeStreamingClient({
region,
credentials,
});
Acquire Speech Stream
The Transcribe Streaming client accepts streaming speech input as an async iterable. You can construct them from either an async generator or using Symbol.asyncIterable
to emit binary chunks.
Here's an example of using async generator:
const audioStream = async function* () {
await device.start();
while (device.ends !== true) {
const chunk = await device.read();
yield chunk;
}
};
Then you need to construct the binary chunk into an audio chunk shape that can be recognized by the SDK:
const audioStream = async function* () {
for await (const chunk of audioSource()) {
yield { AudioEvent: { AudioChunk: chunk } };
}
};
Acquire from Node.js Stream API
In Node.js you will mostly acquire the speech in Stream API, from HTTP request or devices. Stream API in Node.js (>=
10.0.0) itself is an async iterable. You can supply the streaming into the SDK input without explicit convert. You
only need to construct the audio chunk shape that can be recognized by the SDK:
const audioSource = req;
const audioStream = async function* () {
for await (const payloadChunk of audioSource) {
yield { AudioEvent: { AudioChunk: payloadChunk } };
}
};
If you see don't limit the chunk size on the client side, for example, streams from fs
, you might see
The chunk is too big
error from the Transcribe Streaming. You can solve it by setting the HighWaterMark
:
const { PassThrough } = require("stream");
const { createReadStream } = require("fs");
const audioSource = createReadStream("path/to/speech.wav");
const audioPayloadStream = new PassThrough({ highWaterMark: 1 * 1024 });
audioSource.pipe(audioPayloadStream);
const audioStream = async function* () {
for await (const payloadChunk of audioPayloadStream) {
yield { AudioEvent: { AudioChunk: payloadChunk } };
}
};
Depending on the audio source, you may need to PCM encode you audio chunk.
Acquire from Browsers
The Transcribe Streaming SDK client also supports streaming from browsers. You can acquire the microphone data through
getUserMedia
API. Note that this API is supported by a subset of browsers.
Here's a code snippet of acquiring microphone audio stream using microphone-stream
const mic = require("microphone-stream");
micStream.setStream(
await window.navigator.mediaDevices.getUserMedia({
video: false,
audio: true,
})
);
const audioStream = async function* () {
for await (const chunk of micStream) {
yield { AudioEvent: { AudioChunk: pcmEncodeChunk(chunk) } };
}
};
You can find the a full front-end example here
PCM encoding
Currently Transcribe Streaming service only accepts PCM encoding. If your audio source is not already encoded,
you need to PCM encoding the chunks. Here's an example:
const pcmEncodeChunk = (chunk) => {
const input = mic.toRaw(chunk);
var offset = 0;
var buffer = new ArrayBuffer(input.length * 2);
var view = new DataView(buffer);
for (var i = 0; i < input.length; i++, offset += 2) {
var s = Math.max(-1, Math.min(1, input[i]));
view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
}
return Buffer.from(buffer);
};
Send the Speech Stream
const command = new StartStreamTranscriptionCommand({
LanguageCode: "en-US",
MediaEncoding: "pcm",
MediaSampleRateHertz: 44100,
AudioStream: audioStream(),
});
const response = await client.send(command);
Handling Text Stream
If the request succeeds, you will get a response containing the streaming transcript like this.
Just like the input speech stream, the transcript stream is an async iterable emitting the partial
transcripts. Here is a code snippet of accessing the transcripts
for await (const event of response.TranscriptResultStream) {
if (event.TranscriptEvent) {
const message = event.TranscriptEvent;
const results = event.TranscriptEvent.Transcript.Results;
results.map((result) => {
(result.Alternatives || []).map((alternative) => {
const transcript = alternative.Items.map((item) => item.Content).join(" ");
console.log(transcript);
});
});
}
}
Pipe Transcripts Stream
In Node.js, you can pipe this TranscriptResultStream
to other destinations easily with the from
API:
const { Readable } = require("stream");
const transcriptsStream = Readable.from(response.TranscriptResultStream);
transcriptsStream.pipe();
Error Handling
If you are using async...await
style code, you are able to catch the errors with try...catch
block. There are 2
categories of exceptions can be thrown:
- Immediate exceptions thrown before transcription is started, like signature
exceptions, invalid parameters exceptions, and network errors;
- Streaming exceptions that happens after transcription is
started, like
InternalFailureException
or ConflictException
.
For immediate exceptions, the SDK client will retry the request if the error is retryable, like network errors. You can
config the client to behave as you intend to.
For streaming exceptions, because the streaming transcription is already
started, client cannot retry the request automatically. The client will throw these exceptions and users can handle the
stream behavior accordingly.
Here's an example of error handling flow:
try {
const response = await client.send(command);
await handleResponse(response);
} catch (e) {
if (e instanceof InternalFailureException) {
} else if (e instanceof ConflictException) {
}
} finally {
}
Notes for React Native
This package is compatible with React Native (>= 0.60). However, it is not tested with any React Native libraries that
converts microphone record into streaming data. Community input for integrating streaming microphone record data is
welcome.
Thank you for reading this guide. If you want to know more about how streams are encoded, how connection is established,
please refer to the Service API guide.
Contributing
This client code is generated automatically. Any modifications will be overwritten the next time the @aws-sdk/client-transcribe-streaming
package is updated. To contribute to client you can check our
generate clients scripts.
License
This SDK is distributed under the
Apache License, Version 2.0,
see LICENSE for more informatio