AssemblyAI JavaScript SDK
The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API,
which supports async and real-time transcription, as well as the latest LeMUR models.
It is written primarily for Node.js in TypeScript with all types exported, but also compatible with other runtimes.
Documentation
Visit the AssemblyAI documentation for step-by-step instructions and a lot more details about our AI models and API.
Explore the SDK API reference for more details on the SDK types, functions, and classes.
Quickstart
Install the AssemblyAI SDK using your preferred package manager:
npm install assemblyai
yarn add assemblyai
pnpm add assemblyai
bun add assemblyai
Then, import the assemblyai
module and create an AssemblyAI object with your API key:
import { AssemblyAI } from "assemblyai";
const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY,
});
You can now use the client
object to interact with the AssemblyAI API.
Speech-To-Text
Transcribe audio and video files
Transcribe an audio file with a public URL
When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.
let transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
});
Note
You can also pass a local file path, a stream, or a buffer as the audio
property.
transcribe
queues a transcription job and polls it until the status
is completed
or error
.
If you don't want to wait until the transcript is ready, you can use submit
:
let transcript = await client.transcripts.submit({
audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
});
Transcribe a local audio file
When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.
let transcript = await client.transcripts.transcribe({
audio: "./news.mp4",
});
Note:
You can also pass a file URL, a stream, or a buffer as the audio
property.
transcribe
queues a transcription job and polls it until the status
is completed
or error
.
If you don't want to wait until the transcript is ready, you can use submit
:
let transcript = await client.transcripts.submit({
audio: "./news.mp4",
});
Enable additional AI models
You can extract even more insights from the audio by enabling any of our AI models using transcription options.
For example, here's how to enable Speaker diarization model to detect who said what.
let transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
speaker_labels: true,
});
for (let utterance of transcript.utterances) {
console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
}
Get a transcript
This will return the transcript object in its current state. If the transcript is still processing, the status
field will be queued
or processing
. Once the transcript is complete, the status
field will be completed
.
const transcript = await client.transcripts.get(transcript.id);
If you created a transcript using .submit()
, you can still poll until the transcript status
is completed
or error
using .waitUntilReady()
:
const transcript = await client.transcripts.waitUntilReady(transcript.id, {
pollingInterval: 1000,
pollingTimeout: 5000,
});
Get sentences and paragraphs
const sentences = await client.transcripts.sentences(transcript.id);
const paragraphs = await client.transcripts.paragraphs(transcript.id);
Get subtitles
const charsPerCaption = 32;
let srt = await client.transcripts.subtitles(transcript.id, "srt");
srt = await client.transcripts.subtitles(transcript.id, "srt", charsPerCaption);
let vtt = await client.transcripts.subtitles(transcript.id, "vtt");
vtt = await client.transcripts.subtitles(transcript.id, "vtt", charsPerCaption);
List transcripts
This will return a page of transcripts you created.
const page = await client.transcripts.list();
You can also paginate over all pages.
let previousPageUrl: string | null = null;
do {
const page = await client.transcripts.list(previousPageUrl);
previousPageUrl = page.page_details.prev_url;
} while (previousPageUrl !== null);
[!NOTE]
To paginate over all pages, you need to use the page.page_details.prev_url
because the transcripts are returned in descending order by creation date and time.
The first page is are the most recent transcript, and each "previous" page are older transcripts.
Delete a transcript
const res = await client.transcripts.delete(transcript.id);
Transcribe in real-time
Create the real-time transcriber.
const rt = client.realtime.transcriber();
You can also pass in the following options.
const rt = client.realtime.transcriber({
realtimeUrl: 'wss://localhost/override',
apiKey: process.env.ASSEMBLYAI_API_KEY
sampleRate: 16_000,
wordBoost: ['foo', 'bar']
});
[!WARNING]
Storing your API key in client-facing applications exposes your API key.
Generate a temporary auth token on the server and pass it to your client.
Server code:
const token = await client.realtime.createTemporaryToken({ expires_in = 60 });
Client code:
import { RealtimeTranscriber } from "assemblyai";
const token = await getToken();
const rt = new RealtimeTranscriber({
token,
});
You can configure the following events.
rt.on("open", ({ sessionId, expiresAt }) => console.log('Session ID:', sessionId, 'Expires at:', expiresAt));
rt.on("close", (code: number, reason: string) => console.log('Closed', code, reason));
rt.on("transcript", (transcript: TranscriptMessage) => console.log('Transcript:', transcript));
rt.on("transcript.partial", (transcript: PartialTranscriptMessage) => console.log('Partial transcript:', transcript));
rt.on("transcript.final", (transcript: FinalTranscriptMessage) => console.log('Final transcript:', transcript));
rt.on("error", (error: Error) => console.error('Error', error));
After configuring your events, connect to the server.
await rt.connect();
Send audio data via chunks.
getAudio((chunk) => {
rt.sendAudio(chunk);
});
Or send audio data via a stream by piping to the real-time stream.
audioStream.pipeTo(rt.stream());
Close the connection when you're finished.
await rt.close();
Apply LLMs to your audio with LeMUR
Call LeMUR endpoints to apply LLMs to your transcript.
Prompt your audio with LeMUR
const { response } = await client.lemur.task({
transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
prompt: "Write a haiku about this conversation.",
});
Summarize with LeMUR
const { response } = await client.lemur.summary({
transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
answer_format: "one sentence",
context: {
speakers: ["Alex", "Bob"],
},
});
Ask questions
const { response } = await client.lemur.questionAnswer({
transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
questions: [
{
question: "What are they discussing?",
answer_format: "text",
},
],
});
Generate action items
const { response } = await client.lemur.actionItems({
transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
});
Delete LeMUR request
const response = await client.lemur.purgeRequestData(lemurResponse.request_id);