Youtube Transcript Loader
pip install llama-hub-youtube-transcript
pip install llama-index-readers-youtube-transcript
This loader fetches the text transcript of Youtube videos using the youtube_transcript_api
Python package.
Usage
To use this loader, you will need to first pip install youtube_transcript_api
.
Then, simply pass an array of YouTube links into load_data
:
from llama_index.readers.youtube_transcript import YoutubeTranscriptReader
loader = YoutubeTranscriptReader()
documents = loader.load_data(
ytlinks=["https://www.youtube.com/watch?v=i3OYlaoj-BM"]
)
Supported URL formats: + youtube.com/watch?v={video_id} (with or without 'www.') + youtube.com/embed?v={video_id} (with or without 'www.') + youtu.be/{video_id} (never includes www subdomain)
To programmatically check if a URL is supported:
from llama_index.readers.youtube_transcript.utils import is_youtube_video
is_youtube_video("https://youtube.com/watch?v=j83jrh2")
is_youtube_video("https://vimeo.com/272134160")
This loader is designed to be used as a way to load data into LlamaIndex.