youte: A command-line tool to retrieve and tidy YouTube metadata and comments from YouTube Data API
Big thanks to @Lingomat (Mat Bettinson) for code review and suggestions.
Installation
python -m pip install youte
YouTube API key
To get data from YouTube API, you will need a YouTube API key. Follow YouTube instructions to obtain a YouTube API key if you do not already have one.
Configure API key (recommended)
You can save your API key in the youte config file for reuse. To do so, run:
youte config add-key
The interactive prompt will ask you to input your API key and name it. The name is used to identify the key, and can be anything you choose.
The prompt will also ask if you want to set the given key as default.
When running queries, if no API key or name is specified, youte
will automatically use the default key.
Manually set a key as default
If you want to manually set an existing key as a default, run:
youte config set-default <name-of-existing-key>
Note that what is passed to this command is the name of the API key, not the API key itself. It follows that the API key has to be first added to the config file using youte config add-key
. If you use a name that has not been added to the config file, an error will be raised.
See the list of all keys
To see the list of all keys, run:
youte config list-keys
The default key, if there is one, will have an asterisk next to it.
Remove a key
To remove a stored key, run:
youte config remove <name-of-key>
About the config file
youte's config file is stored in a central place whose exact location depends on the running operating system:
- Linux/Unix: ~/.config/youte/
- Mac OS X: ~/Library/Application Support/youte/
- Windows: C:\Users\<user>\AppData\Roaming\youte
search
Searching can be as simple as:
youte search <search-terms> --key <API-key> --outfile <name-of-file.json>
youte search <search-terms> --key <API-key> -o <name-of-file.json>
If you have a default key set up using youte config
, then there is no need to specify an API key using --key
.
This will return the maximum number of results pages (around 12-13) matching the search terms and store them in a JSON file. Unlike version 1.3, youte 2.0 does not print results to the terminal. Instead, --outfile
is now a required option. and --outfile
must be specified.
In the search terms, you can also use the Boolean NOT (-) and OR (|) operators to exclude videos or to find videos that match one of several search terms. If the terms contain spaces, the entire search term value has to be wrapped in quotes.
Use the flag --pretty
to pretty format the JSON output.
youte search <search-terms> --key <API-key> --outfile <name-of-file> --pretty
Limit pages returned
Searching is very expensive in terms of API usage - a single results page uses up 100 points - 1% of your standard daily quota. Therefore, you can limit the maximum number of result pages returned, so that a search doesn't go on and exhaust your API quota.
youte search <search-terms> --max-pages 5
youte search <search-terms> -m 5
Tidy data
Raw JSONs from YouTube API contain request metadata and nested fields. You can tidy these data into a CSV or a flat JSON using --tidy-to
. The default format that youte will tidy raw JSON into will be CSV.
youte search <search-terms> --tidy-to <file.csv>
You can specify the encoding for the CSV. By default, youte
uses utf-8-sig
for compatibility with Excel readers. To change this, use the --encoding
argument.
youte search <search-terms> --tidy-to <file.csv> --encoding "utf-8"
Specify --format json
if you want to tidy raw data into an array of flat JSON objects.
youte search <search-terms> --tidy-to <file-name.json> --format json
--tidy-to
option is available for all youte
commands that retrieve data, and works the same way.
Advanced search
There are multiple filters to refine your search. A full list of these are provided below:
Options:
--type TEXT Type of resource to search for [default:
video]
--order [date|rating|relevance|title|videoCount|viewCount]
Sort results [default: date]
--safe-search [none|moderate|strict]
Include or exclude restricted content
[default: none]
--lang TEXT Return results most relevant to a language
(ISO 639-1 two-letter code)
--region TEXT Return videos viewable in the specified
country (ISO 3166-1 alpha-2 code) [default:
US]
--video-duration [any|long|medium|short]
Include videos of a certain duration
--channel-type [any|show] Restrict search to a particular type of
channel
--video-type [any|episode|movie]
Search a particular type of videos
--caption [any|closedCaption|none]
Filter videos based on if they have captions
--definition, --video-definition [any|high|standard]
Include videos by definition
--dimension, --video-dimension [any|2d|3d]
Search 2D or 3D videos
--embeddable, --video-embeddable [any|true]
Search only embeddable videos
--license, --video-license [any|creativeCommon|youtube]
Include videos with a certain license
--location FLOAT... Lat and long coordinates to restrict search
to. --radius must be specified
--radius TEXT Define the geographic area to restrict
search. Must be a number with a unit
Restrict by date range
The --from
and --to
options allow you to restrict your search to a specific period. The input values have to be in ISO format (YYYY-MM-DD). Currently, all dates and times in youte are in UTC.
Restrict by type
The --type
option specifies the type of results returned, which by default is videos. The accepted values are channel
, playlist
, video
, or a combination of these three. If more than one type is specified, separate each by a comma.
youte search <search-terms> --limit 5 --type playlist,video
Restrict by language and region
The --lang
returns results most relevant to a language. Not all results will be in the specified language: results in other languages will still be returned if they are highly relevant to the search query term. To specify the language, use ISO 639-1 two letter code, except that you should use the values zh-Hans
for simplified Chinese and zh-Hant
for traditional Chinese.
The --region
returns results viewable in a region. To specify the region, use ISO 3166-1 alpha-2. Note that this option does not filter videos uploaded in that region, but rather videos that can be viewed in that region.
The --location
and --radius
options define a circular geographic area to filter videos that specify, in their metadata, a location within this area. This is not a robust and reliable way to geolocate YouTube videos, and hence should be used with care.
--location
takes in 2 values - the latitude/longitude coordinates that represent the centre of the area--radius
specifies the maximum distance that the location associated with a video can be from that point for the video to still be included in the search results. It must be a number followed by a unit. Valid units are m
, km
, ft
, and mi
. For example, 1500m
, 5km
, 10000ft
, and 0.75mi
.
Both --location
and --radius
have to be specified if they are to be used, otherwise an API error will be thrown.
Sort results
The --order
option specifies how results will be sorted. The following values are accepted:
date
: Resources are sorted in reverse chronological order based on the date they were created (default value).rating
: Resources are sorted from highest to lowest rating.relevance
: Resources are sorted based on their relevance to the search query.title
– Resources are sorted alphabetically by title.videoCount
– Channels are sorted in descending order of their number of uploaded videos.viewCount
– Resources are sorted from highest to lowest number of views. For live broadcasts, videos are sorted by number of concurrent viewers while the broadcasts are ongoing.
videos
youte videos
takes in one or multiple video IDs and retrieve all public metadata for those videos. This complements results returned from youte search
, as they contain only limited information.
youte hydrate <resource-id>.... --outfile <file.json>
You can put as many IDs as you need, separating each with a space.
Like search, you can also tidy the data to a CSV using the --tidy-to
option.
youte hydrate <resource-id>... --outfile <file.json> --tidy-to <file-name.csv>
Use IDs from text file
You can hydrate a list of video ids stored in a text file by using --file-path
or -f
. The text file should contain a line-separated list of video ids, with no header.
youte hydrate -f <id-file.csv>
This option is often used in combination with youte dehydrate
, which retrieves the ids from raw JSON returned by youte search
and stores them in a text file.
channels
youte channels
works the same as youte videos
, except it retrieves channel metadata from channel ids.
You can either hydrate channels by channel IDs, or handles, i.e. @stanfordgsb. To pass handles, use the option --handles
followed by a comma separated list of handles, each prefererably prepended by @
. For example:
youte channels --handles @stanfordgsb,@TED,@TEDEd -o <file.json>
To use a file containing handles, pass --handle-file
.
You can hydrate both channel IDs and handles in one command:
youte channels -f <id-file.csv> --handles @stanfordgsb,@TED,@TEDEd -o <file.json>
This will hydrate both the channel IDs in the file and the handles specified in the terminal.
youte comments
retrieves top-level comments (comment threads) on one or multiple videos or channels. It takes in a list of ids and a flag indicating which type these ids are (i.e. videos or channels).
To retrieve comments on videos, specify the video ids and pass the --by-video-id
or -v
flag.
youte comments <id>... --by-video-id --outfile <file.json>
#OR
youte comments <id>... -v --outfile <file.json>
To retrieve comments on channels, specify channel ids and pass the --by-channel-id
or -c
flag.
youte comments <id>... --by-channel-id --outfile <file.json>
OR
youte comments <id>... -c --outfile <file.json>
If neither of the flags are specified, youte comments
will assume the ids are thread ids and retrieve the full metadata for those threads.
You can search within the threads and filter threads that match the search terms, by using the --query
or -q
option.
youte comments <ids>... -v --outfile <file.json> -q "search term"
replies
While youte comments
only retrieve top-level comment threads, if those threads have replies, they can be retrieved using youte replies
. youte replies
takes a list of thread ids and return the replies to those threads.
youte replies <ids>... --outfile <file.json>
chart
youte chart
retrieves the most popular videos in a region, specified by ISO 3166-1 alpha-2 country codes. If no argument or option is given, it retrieves the most popular videos in the United States.
For example:
youte chart <region-code> -o <file.json>
full-archive
A new feature added in youte
2.1.0 is the ability to run a full archive workflow in one command. youte full-archive
runs youte search
, then retrieving video and channel metadata for the search results, as well as getting comments and replies for those videos as well. All data are then tidied and stored in multiple tables in an SQLite database.
youte full-archive <query> [options] -o <name-of-database-file>
The search options are identical to youte search
. Name of the file given to -o
has to have SQLite extension (i.e. .db
or .sqlite
).
Below are the list of tables and the corresponding YouTube resource that they contain:
search_result
: search results from youte search
video
: videoschannel
: channelscommment
: comment threads and replies
Warning: since full-archive
will potentially run a large number of queries, it's important to ensure you have enough API quota. You can select which resources to retrieve by using the --select
option. --select
takes one or a comma-separated list of YouTube resource types, namely video
, channel
, thread
, and reply
. Note that if you select reply
, thread
also has to be selected. This is because comment thread replies are retrieved using thread IDs, thus collecting comment threads is a must before getting replies. Because of that, if you want to archive the replies, both 'thread' and 'reply' will have to be specified.
dehydrate
dehydrate
extracts the IDs from a JSON file returned from YouTube API.
youte dehydrate <file-name.json>
Options:
-o, --output FILENAME Output text file to store IDs in
related-to (deprecated)
Note. From August 7, 2023, this endpoint was deprecated by YouTube. This command is no longer usable.
youte related-to
retrieves a list of videos related to a video.
youte related-to <video-ids>... -o <file.json>
You can pass one or many video IDs. If multiple video IDs are inputted, youte will iterate through those video IDs, retrieving all related videos to each video separately. The end result contains both the related videos and the id of the video that they are related to.
Other options include:
--safe-search [none|moderate|strict]
Include or exclude restricted content
[default: none]
--region TEXT Specify region the videos can be viewed in
(ISO 3166-1 alpha-2 country code)
--lang TEXT Return results most relevant to a language
(ISO 639-1 two-letter code)
YouTube API Quota system and youte handling of quota
Most often, there is a limit to how many requests you can make to YouTube API per day. YouTube Data API uses a quota system, whereby each request costs a number of units depending on the endpoint the request is made to.
For example:
- search endpoint costs 100 units per request
- video, channel, commentThread, and comment endpoints each costs 1 unit per request
Free accounts get an API quota cap of 10,000 units per project per day, which resets at midnight Pacific Time.
At present, you can only check your quota usage on the Quotas page in the API Console. It is not possible to monitor quota usage via metadata returned in the API response. youte
does not monitor quota usage.