yt-video-text-md

Fetch YouTube video transcripts and save them to markdown files.

0.1.0
PyPI

Maintainers: 1

YouTube Video to Text Markdown Converter

yt-video-text-md is a Python package designed to retrieve and convert YouTube video transcripts/subtitles into Markdown files. This tool is particularly useful for extracting text from entire playlists or individual videos. It leverages the youtube-transcript-api for direct subtitle extraction and whisper for audio-to-text conversion when transcripts are unavailable.

Features

Playlist and Video Support: Extracts subtitles from both individual videos and entire playlists.
Fallback Mechanism: Utilizes whisper to transcribe audio if subtitles are not available.
Markdown Formatting: Outputs transcripts in Markdown format with video titles as headers.

Installation

Via pip

To install the latest version directly from the GitHub repository, use:

pip install git+https://github.com/kothiyarajesh/yt-video-text-md.git

Building from Source

Clone the repository:

git clone https://github.com/kothiyarajesh/yt-video-text-md.git

Navigate to the project directory:
```
cd yt-video-text-md
```
Install the package:
```
python setup.py install
```
If installing from source, make sure to install the dependencies manually:
```
pip install -r requirements.txt
```

Usage

Python Script

Here's a simple example of how to use the yt-video-text-md library in a Python script:

from yt_video_text_md import YTVideoTextMD

# Define the URL of the YouTube video or playlist you want to process
video_url = "https://www.youtube.com/watch?v=pzo13OPXZS4"

# Specify the directory where the output Markdown file will be saved
output_directory = "."

# Set the default name for the generated Markdown file
markdown_file_name = "yt_video_2_text_md_"

# Define the directory where temporary audio files will be stored (Used only if a transcript is not available)
temporary_audio_directory = "/tmp"

# Create an instance of YTVideoTextMD with the specified parameters
YTVideoTextMD(
    url=video_url,
    output_dir=output_directory,
    default_md_file_name=markdown_file_name,
    audio_output_dir=temporary_audio_directory
)

Command-Line Interface

You can also use the package from the command line:

yt-video-text-md -u "https://www.youtube.com/playlist?list=PLMrJAkhIeNNQV7wi9r7Kut8liLFMWQOXn" -d "." -f "playlist_video_" -ad "/tmp"

Options:

-u or --url: URL of the YouTube video or playlist.
-d or --output-dir: Directory where the output Markdown file will be saved.
-f or --file-name: Name for the generated Markdown file.
-ad or --audio-dir: Directory where temporary audio files will be stored (used only if a transcript is not available).

Notes

Dependencies: This package relies on several external libraries. Ensure all dependencies are installed for optimal functionality.
Audio Extraction: If a video does not have an available transcript, the script will download the video, extract the audio, and convert it to text. This process requires a stable internet connection and may be resource-intensive, especially for long videos.

License

This project is licensed under the MIT License. See the LICENSE file for details.

FAQs

What is yt-video-text-md?

Is yt-video-text-md well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install