Socket
Book a DemoInstallSign in
Socket

yaytt

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

yaytt

Blazingly fast YouTube caption extractor with deduplication.

latest
Source
npmnpm
Version
0.9.4
Version published
Maintainers
1
Created
Source

YAYTT - Yet Another Youtube Transcriptor

npm version npm downloads TypeScript

Features

  • Smart deduplication - Removes overlapping auto-generated caption segments
  • TypeScript support - Full type definitions included
  • Zero dependencies - Lightweight and self-contained

Installation

 bun add yaytt
npm install yaytt
yarn add yaytt
pnpm add yaytt

Quick Start

import { extractCaptions } from "yaytt";

const captions = await extractCaptions("WcBA3QEXJ2o");

const englishCaptions = await extractCaptions("WcBA3QEXJ2o", { lang: "en" });

const captions = await extractCaptions(
  "https://www.youtube.com/watch?v=WcBA3QEXJ2o",
);

Advanced Usage

Ultra-aggressive deduplication for heavily overlapping captions

import { extractCaptions } from "yaytt";

const cleanCaptions = await extractCaptions("WcBA3QEXJ2o", {
  deduplicationOptions: {
    aggressiveMode: true, // Maximum deduplication
  },
});

Check available languages

import { getAvailableLanguages } from "yaytt";

const languages = await getAvailableLanguages("WcBA3QEXJ2o");
console.log(languages);
// [{ code: 'pt', name: 'Portuguese (auto-generated)', isAutomatic: true }]

Full configuration

import { YouTubeCaptionExtractor } from "yaytt";

const extractor = new YouTubeCaptionExtractor({
  userAgent: "MyApp/1.0",
  timeout: 15000,
  rateLimitDelay: 3000,
});

const captions = await extractor.extractCaptions("WcBA3QEXJ2o", {
  lang: "pt",
  retries: 3,
  deduplicate: true,
  deduplicationOptions: {
    timeThreshold: 3, // Seconds
    similarityThreshold: 0.8, // 80% similarity
    mergePartialMatches: true,
    aggressiveMode: false, // Set to true for maximum deduplication
  },
});

CLI

npx yaytt WcBA3QEXJ2o

npx yaytt WcBA3QEXJ2o --aggressive

npx yaytt "https://www.youtube.com/watch?v=WcBA3QEXJ2o"

API Reference

extractCaptions(videoIdOrUrl, options?)

Extract captions from a YouTube video.

Parameters:

  • videoIdOrUrl (string): YouTube video ID or full URL
  • options (object, optional):
    • lang (string): Language code (default: 'pt' for Portuguese)
    • deduplicate (boolean): Enable deduplication (default: true)
    • deduplicationOptions (object): Deduplication settings

Returns: Promise<Caption[]>

getAvailableLanguages(videoIdOrUrl)

Get all available caption languages for a video.

Parameters:

  • videoIdOrUrl (string): YouTube video ID or full URL

Returns: Promise<{ code: string, name: string, isAutomatic: boolean }[]>

Types

interface Caption {
  start: number; // Start time in seconds
  dur: number; // Duration in seconds
  text: string; // Caption text
}

interface CaptionOptions {
  lang?: string;
  retries?: number;
  fallback?: boolean;
  deduplicate?: boolean;
  deduplicationOptions?: {
    timeThreshold?: number; // Default: 3 seconds
    similarityThreshold?: number; // Default: 0.8 (80% similarity)
    mergePartialMatches?: boolean; // Default: true
    aggressiveMode?: boolean; // Default: false
  };
}

Deduplication

YouTube's auto-generated captions often contain overlapping segments:

Before:
[0:00] [Música]
[0:00] [Música] O podcast que você ouve agora é uma
[0:02] O podcast que você ouve agora é uma
[0:02] O podcast que você ouve agora é uma produção da Central 3.

After:
[0:02] O podcast que você ouve agora é uma produção da Central 3.

Results:

  • Normal mode: ~50% reduction in caption count
  • Aggressive mode: ~70% reduction for heavily overlapping content

How It Works

  • Extracts API keys from YouTube video pages
  • Calls YouTube's Innertube API directly (same API used by youtube.com)
  • Fetches caption track URLs from video metadata
  • Downloads VTT caption files directly from YouTube's servers
  • Parses timestamps and text into a clean format
  • Applies smart deduplication to remove overlapping segments

Requirements

  • Node.js 16+ or compatible runtime
  • Server-side only (not for browser use due to CORS)

Error Handling

import { extractCaptions, CaptionExtractionError } from "yaytt";

try {
  const captions = await extractCaptions("invalid-video-id");
} catch (error) {
  if (error instanceof CaptionExtractionError) {
    console.error(`Caption extraction failed: ${error.message}`);
    console.error(`Video ID: ${error.videoId}`);
  }
}

License

MIT

Keywords

youtube

FAQs

Package last updated on 04 Sep 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.