šļø audio-transcripter

A lightweight TypeScript library for transcribing audio files using Google Gemini 2.0 models.
Supports local files, remote URLs, and in-memory buffers/blobs.
Ideal for meetings, interviews, podcasts, technical content, and more.
š Installation
npm install audio-transcripter
š Features
-
š§ Supports local files (.wav
, .mp3
, .aac
, .flac
, .ogg
, .webm
, etc.)
-
š Supports remote URLs (HTTP/HTTPS)
-
š¦ Supports Blobs / Buffers
-
⨠Multiple transcription styles:
accurate
clean
structured
technical
conversational
-
š Verbose logging (optional)
-
āļø Written in TypeScript with full type safety
š§āš» Usage
1ļøā£ Transcribe Local File
import { runTranscription } from "audio-transcripter";
const result = await runTranscription({
audioFile: "./assets/audio.webm",
style: "structured",
language: "english",
});
if (result.success) {
console.log("Transcription:", result.transcription);
} else {
console.error("Error:", result.error);
}
2ļøā£ Transcribe Remote URL
const result = await runTranscription({
audioFile: "https://example.com/audio.mp3",
style: "clean",
language: "english",
});
3ļøā£ Transcribe Blob / Buffer (for browser or Node.js)
import { runTranscriptionWithBlob } from "audio-transcripter";
const fs = await import("fs/promises");
const audioBuffer = await fs.readFile("./assets/audio.wav");
const result = await runTranscriptionWithBlob(audioBuffer, {
style: "technical",
language: "english",
});
if (result.success) {
console.log("Transcription:", result.transcription);
} else {
console.error("Error:", result.error);
}
š„ Configuration Options
audioFile | string | required | Local file path or remote URL |
style | string | 'conversational' | Transcription style (see below) |
language | string | 'english' | Language of the audio |
verbose | boolean | true | Enable verbose console logs |
timeout | number | 5000 (ms) | Timeout for remote URL HEAD check (if applicable) |
šØ Supported Transcription Styles
accurate | High accuracy, raw transcription including filler words |
clean | Edited for readability (filler words removed, grammar fixed) |
structured | Meeting/interview format with speakers and structure |
technical | Technical content with jargon preserved |
conversational | Casual, creative, natural conversation transcription |
šļø Supported File Formats
.mp3
.wav
.aac
.flac
.ogg
.webm
/ .weba
Unknown formats fallback to audio/octet-stream
.
š API Reference
runTranscription(config: TranscriptionConfig)
Runs transcription on local file path or remote URL.
Returns: Promise<RunTranscriptionResult>
type RunTranscriptionResult = {
success: boolean;
transcription?: string;
error?: string;
};
runTranscriptionWithBlob(audioBlob: Blob | Buffer, options?)
Runs transcription on an in-memory Blob or Node.js Buffer.
Returns: Promise<RunTranscriptionResult>
šļø Type Definitions
export type TranscriptionStyle =
| "accurate"
| "clean"
| "structured"
| "technical"
| "conversational";
export interface TranscriptionConfig {
audioFile: string;
style?: TranscriptionStyle;
language?: string | null;
verbose?: boolean;
timeout?: number;
}
export interface RunTranscriptionResult {
success: boolean;
transcription?: string;
error?: string;
}
š Authentication
This package requires a Gemini API Key.
1ļøā£ Set TRANSCRIBER_KEY
in your environment:
export TRANSCRIBER_KEY=your-gemini-api-key-here
or
2ļøā£ Create a .env
file:
TRANSCRIBER_KEY=your-gemini-api-key-here
Get your API key from Google MakerSuite.
š ļø Tech Stack
š License
MIT License Ā© 2025 Shriansh Agarwal
š FAQ
Q: Does this upload my file to third-party storage?
A: No. Files are uploaded only to Gemini's File API endpoint.
Q: Can I use this in the browser?
A: runTranscriptionWithBlob
works with browser Blob and Node.js Buffer.
Q: What models are used?
A: gemini-2.0-flash
model via Google GenAI SDK.
Summary
ā
Lightweight
ā
Flexible API
ā
Multiple transcription styles
ā
Works with Files, URLs, Blobs/Buffer
ā
Production-ready TypeScript types