youtool - Easily access YouTube Data API v3 in batches
Python library (and future command-line interface) to crawl YouTube Data API v3 in batch operations and other related
tasks. Easier to use than alternatives - you don't need to spend time learning the YouTube API and its caveats. With
this library you can get:
- Channel ID from channel URL (scraping) or username (API)
- Channel information (title, subscribers etc.)
- List of playlists for a channel
- List of videos for a playlist
- Video search (many parameters)
- Video information (title, description, likes, comments etc.)
- Comments
- Livechat, including superchat (scraping using
chat-downloader)
- Automatic transcription (scraping using yt-dlp)
The library will automatically:
- Try as many keys as you provide
- Use batch of 50 items in supported API endpoints
- Paginate when needed
Installing
pip install youtool
You may also want some extras:
pip install youtool[livechat]
pip install youtool[transcription]
Using as a library
Just follow the tutorial/examples below and check the help()
for YouTube
methods.
Note: the examples below will use 135 units of your API key quota.
from pprint import pprint
from pathlib import Path
from youtool import YouTube
api_keys = ["key1", "key2", ...]
yt = YouTube(api_keys, disable_ipv6=True)
channel_id_1 = yt.channel_id_from_url("https://youtube.com/c/PythonicCafe/")
print(f"Pythonic Café's channel ID (got from URL): {channel_id_1}")
channel_id_2 = yt.channel_id_from_username("turicas")
print(f"Turicas' channel ID (got from username): {channel_id_2}")
print("Playlists found on Turicas' channel (the \"uploads\" playlist is not here):")
for playlist in yt.channel_playlists(channel_id_2):
print(f"Playlist: {playlist}")
for video in yt.playlist_videos(playlist["id"]):
print(f" Video: {video}")
print("-" * 80)
assert channel_id_1[:2] == "UC"
print("Last 3 uploads for Pythonic Café:")
for index, video in enumerate(yt.playlist_videos("UU" + channel_id_1[2:])):
print(f" Video: {video}")
if index == 2:
break
print("-" * 80)
print("5 videos found on search:")
for index, video in enumerate(yt.video_search(term="Álvaro Justen")):
print(f" Video: {video}")
if index == 4:
break
print("-" * 80)
last_video = list(yt.videos_infos([video["id"]]))[0]
print("Complete information for last video:")
pprint(last_video)
print("-" * 80)
print("Channel information (2 channels in one request):")
for channel in yt.channels_infos([channel_id_1, channel_id_2]):
print(channel)
print("-" * 80)
video_id = "b1FjmUzgFB0"
print(f"Comments for video {video_id}:")
for comment in yt.video_comments(video_id):
print(comment)
print("-" * 80)
live_video_id = "yyzIPQsa98A"
print(f"Live chat for video {live_video_id}:")
for chat_message in yt.video_livechat(live_video_id):
print(chat_message)
print("-" * 80)
download_path = Path("transcriptions")
if not download_path.exists():
download_path.mkdir(parents=True)
print(f"Downloading Portuguese (pt) transcriptions for videos {video_id} and {live_video_id} - saving at {download_path.absolute()}")
for downloaded in yt.download_transcriptions([video_id, live_video_id], language_code="pt", path=download_path):
vid, status, filename = downloaded["video_id"], downloaded["status"], downloaded["filename"]
if status == "error":
print(f" {vid}: error downloading!")
elif status == "skipped":
print(f" {vid}: skipped, file already exists ({filename}: {filename.stat().st_size / 1024:.1f} KiB)")
elif status == "done":
print(f" {vid}: done ({filename}: {filename.stat().st_size / 1024:.1f} KiB)")
print("-" * 80)
print("Categories in Brazilian YouTube:")
for category in yt.categories(region_code="BR"):
print(category)
print("-" * 80)
print("Current most popular videos in Brazil:")
for video in yt.most_popular(region_code="BR"):
print(f"{video['id']} {video['title']}")
print("-" * 80)
print("Total quota used during this session:")
total_used = 0
for method, units_used in yt.used_quota.items():
print(f"{method:20}: {units_used:05d} unit{'' if units_used == 1 else 's'}")
total_used += units_used
print(f"TOTAL : {total_used:05d} unit{'' if total_used == 1 else 's'}")
Tests
To run all tests, execute:
make test
Future improvments
Pull requests are welcome! :)
- Command-line interface with the following subcommands:
- channel-id: get channel IDs from a list of URLs (or CSV filename with URLs inside), generate CSV output (just the
IDs)
- channel-info: get channel info from a list of IDs (or CSV filename with IDs inside), generate CSV output (same
schema for
channel
dicts) - video-info: get video info from a list of IDs or URLs (or CSV filename with URLs/IDs inside), generate CSV output
(same schema for
video
dicts) - video-search: get video info from a list of IDs or URLs (or CSV filename with URLs/IDs inside), generate CSV output
(simplified
video
dict schema or option to get full video info after) - video-comments: get comments from a video ID, generate CSV output (same schema for
comment
dicts) - video-livechat: get comments from a video ID, generate CSV output (same schema for
chat_message
dicts) - video-transcriptions: download video transcriptions based on language code, path and list of video IDs or URLs (or
CSV filename with URLs/IDs inside), download files to destination and report results
- Replace
dict
s with dataclasses - Create a website with docs/reference
- Deal with quotas (wait some time before using a key, for example)
License
GNU Lesser General Public License (LGPL) version3.
This project was developed in a partnership between Pythonic Café and Novelo
Data.