Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

assemblyai

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

assemblyai

The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models.

4.3.4
Source
npm

Version published: 8 months ago

Weekly downloads: 59K; increased by9.76%

Maintainers: 1

Weekly downloads

Created: 6 years ago

Source

AssemblyAI JavaScript SDK

The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models. It is written primarily for Node.js in TypeScript with all types exported, but also compatible with other runtimes.

Documentation

Visit the AssemblyAI documentation for step-by-step instructions and a lot more details about our AI models and API.

Quickstart

Install the AssemblyAI SDK using your preferred package manager:

npm install assemblyai

yarn add assemblyai

pnpm add assemblyai

bun add assemblyai

Then, import the assemblyai module and create an AssemblyAI object with your API key:

import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY,
});

You can now use the client object to interact with the AssemblyAI API.

Speech-To-Text

Transcribe audio and video files

Transcribe an audio file with a public URL

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

// Transcribe file at remote URL
let transcript = await client.transcripts.transcribe({
  audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
});

Note You can also pass a local file path, a stream, or a buffer as the audio property.

transcribe queues a transcription job and polls it until the status is completed or error.

If you don't want to wait until the transcript is ready, you can use submit:

let transcript = await client.transcripts.submit({
  audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
});

Transcribe a local audio file

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

// Upload a file via local path and transcribe
let transcript = await client.transcripts.transcribe({
  audio: "./news.mp4",
});

Note: You can also pass a file URL, a stream, or a buffer as the audio property.

transcribe queues a transcription job and polls it until the status is completed or error.

If you don't want to wait until the transcript is ready, you can use submit:

let transcript = await client.transcripts.submit({
  audio: "./news.mp4",
});

Enable additional AI models

You can extract even more insights from the audio by enabling any of our AI models using transcription options. For example, here's how to enable Speaker diarization model to detect who said what.

let transcript = await client.transcripts.transcribe({
  audio: "https://storage.googleapis.com/aai-web-samples/espn-bears.m4a",
  speaker_labels: true,
});
for (let utterance of transcript.utterances) {
  console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
}

Get a transcript

This will return the transcript object in its current state. If the transcript is still processing, the status field will be queued or processing. Once the transcript is complete, the status field will be completed.

const transcript = await client.transcripts.get(transcript.id);

If you created a transcript using .submit(), you can still poll until the transcript status is completed or error using .waitUntilReady():

const transcript = await client.transcripts.waitUntilReady(transcript.id, {
  // How frequently the transcript is polled in ms. Defaults to 3000.
  pollingInterval: 1000,
  // How long to wait in ms until the "Polling timeout" error is thrown. Defaults to infinite (-1).
  pollingTimeout: 5000,
});

Get sentences and paragraphs

const sentences = await client.transcripts.sentences(transcript.id);
const paragraphs = await client.transcripts.paragraphs(transcript.id);

Get subtitles

const charsPerCaption = 32;
let srt = await client.transcripts.subtitles(transcript.id, "srt");
srt = await client.transcripts.subtitles(transcript.id, "srt", charsPerCaption);

let vtt = await client.transcripts.subtitles(transcript.id, "vtt");
vtt = await client.transcripts.subtitles(transcript.id, "vtt", charsPerCaption);

List transcripts

This will return a page of transcripts you created.

const page = await client.transcripts.list();

You can also paginate over all pages.

let previousPageUrl: string | null = null;
do {
  const page = await client.transcripts.list(previousPageUrl);
  previousPageUrl = page.page_details.prev_url;
} while (previousPageUrl !== null);

[!NOTE] To paginate over all pages, you need to use the page.page_details.prev_url because the transcripts are returned in descending order by creation date and time. The first page is are the most recent transcript, and each "previous" page are older transcripts.

Delete a transcript

const res = await client.transcripts.delete(transcript.id);

Transcribe in real-time

Create the real-time transcriber.

const rt = client.realtime.transcriber();

You can also pass in the following options.

const rt = client.realtime.transcriber({
  realtimeUrl: 'wss://localhost/override',
  apiKey: process.env.ASSEMBLYAI_API_KEY // The API key passed to `AssemblyAI` will be used by default,
  sampleRate: 16_000,
  wordBoost: ['foo', 'bar']
});

You can also generate a temporary auth token for real-time.

const token = await client.realtime.createTemporaryToken({ expires_in = 60 });
const rt = client.realtime.transcriber({
  token: token,
});

[!WARNING] Storing your API key in client-facing applications exposes your API key. Generate a temporary auth token on the server and pass it to your client.

You can configure the following events.

rt.on("open", ({ sessionId, expiresAt }) => console.log('Session ID:', sessionId, 'Expires at:', expiresAt));
rt.on("close", (code: number, reason: string) => console.log('Closed', code, reason));
rt.on("transcript", (transcript: TranscriptMessage) => console.log('Transcript:', transcript));
rt.on("transcript.partial", (transcript: PartialTranscriptMessage) => console.log('Partial transcript:', transcript));
rt.on("transcript.final", (transcript: FinalTranscriptMessage) => console.log('Final transcript:', transcript));
rt.on("error", (error: Error) => console.error('Error', error));

After configuring your events, connect to the server.

await rt.connect();

Send audio data via chunks.

// Pseudo code for getting audio
getAudio((chunk) => {
  rt.sendAudio(chunk);
});

Or send audio data via a stream by piping to the real-time stream.

audioStream.pipeTo(rt.stream());

Close the connection when you're finished.

await rt.close();

Apply LLMs to your audio with LeMUR

Call LeMUR endpoints to apply LLMs to your transcript.

Prompt your audio with LeMUR

const { response } = await client.lemur.task({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  prompt: "Write a haiku about this conversation.",
});

Summarize with LeMUR

const { response } = await client.lemur.summary({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  answer_format: "one sentence",
  context: {
    speakers: ["Alex", "Bob"],
  },
});

Ask questions

const { response } = await client.lemur.questionAnswer({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  questions: [
    {
      question: "What are they discussing?",
      answer_format: "text",
    },
  ],
});

Generate action items

const { response } = await client.lemur.actionItems({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
});

Delete LeMUR request

const response = await client.lemur.purgeRequestData(lemurResponse.request_id);

[4.3.4] - 2024-04-02

Added

SpeechModel.Best enum
TranscriptListItem.error property

Changed

Make PageDetails.prev_url nullable
Rename Realtime to Streaming inside code documentation
More inline code documentation

Fixed

Rename SubstitutionPolicy literal "entity_type" to "entity_name"
Fix the pagination example in "List transcripts" sample on README

Keywords

FAQs

What is assemblyai?

Is assemblyai popular?

Is assemblyai well maintained?

Package last updated on 02 Apr 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install