Research
Security News
Kill Switch Hidden in npm Packages Typosquatting Chalk and Chokidar
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
YouTube bot to make a YouTube videos list (including all video titles and URLs uploaded by a channel) with end-to-end web scraping - no API tokens required. 🌟 Star this repo if you found it useful! 🌟
This package uses f-strings (more here), and so requires Python 3.6+.
If you have an older version of Python, you can download Python 3.9.1 (follow links below) and follow the instructions to set up Python for your machine. If you want to install a different version, visit the Python Downloads page and select the version you want.
This is required to make sure you can download and install the required Selenium binary dependencies.
Command Prompt
or Powershell
(both work) in "Run as Administrator" mode/usr/local/bin/
sudo chown $USER /usr/local/bin/
venv
(optional)While creating a virtual environment is not required to use this package, creating a virtual environment is useful for avoiding dependency conflicts with other projects. If you are sure you do not need to worry about dependency conflicts with other projects, skip this step.
Python has many ways to set up and use a virtual environment. The following instructions use the venv
provided with the python standard library for simplicity. You do not need to use this particular implementation of a virtual environment, but virtual environments are outside of the scope of this project, so you will need to figure out how to set up and use a different implementation of python virtual environments on your own if you choose a different implementation of a virtual environment, since there are too many different variations to cover here.
### CREATING the virtual environment on MacOS/Linux ### python3 -m venv ytvl-venv source ytvl-venv/bin/activate # python3 # enter the python shell inside this virtual environment deactivate # exit this virtual environment ### USING the virtual environment on MacOS/Linux ### # if ytvl-venv is in the directory you are currently in: source ytvl-venv/bin/activate # if ytvl-venv is NOT in the directory you are currently in: source /absolute/path/to/ytvl-venv/bin/activate deactivate # exit this virtual environment
### CREATING the virtual environment on Windows (NOT FOR git BASH) ### python -m venv ytvl-venv ytvl-venv\Scripts\activate # python # enter the python shell inside this virtual environment deactivate # exit this virtual environment ### USING the virtual environment on Windows (NOT FOR git BASH) ### # if ytvl-venv is in the directory you are currently in: ytvl-venv\Scripts\activate # if ytvl-venv is NOT in the directory you are currently in: ## you may need to ## include the .ps1 extenstion (activate.ps1) in Powershell ## or include the .bat extension (activate.bat) in Command Prompt \absolute\path\to\ytvl-venv\Scripts\activate deactivate # exit this virtual environment
### CREATING the virtual environment on Windows (FOR git BASH) ### python -m venv ytvl-venv source ytvl-venv/Scripts/activate # python # enter the python shell inside this virtual environment deactivate # exit this virtual environment ### USING the virtual environment on Windows (FOR git BASH) ### # if ytvl-venv is in the directory you are currently in: source ytvl-venv/Scripts/activate # if ytvl-venv is NOT in the directory you are currently in: source /absolute/path/to/ytvl-venv/Scripts/activate deactivate # exit this virtual environment
After you install Python 3.6+ and ensure you have the required permissions as needed and have activated your virtual environment as required (if you decide to use a virtual environment - you do not need to use a virtual environment, but if you choose to use venv
, follow the instructions above), enter the following in your command line:
# if something isn't working properly, try rerunning this
# the problem may have been fixed with a newer version
pip3 install -U yt-videos-list # MacOS/Linux
pip install -U yt-videos-list # Windows
# if that doesn't work:
python3 -m pip install -U yt-videos-list # MacOS/Linux
python -m pip install -U yt-videos-list # Windows
Command Prompt
or Powershell
(both work) in "Run as Administrator" mode!yt_videos_list
to update selenium webdriver binaries to be compatible with newer browser versions as browsers are updated (e.g. your Firefox browser updates from version 77 to version 82)
yt_videos_list/docs/dependencies.json
filepython3 # MacOS/Linux
python # Windows
from yt_videos_list import ListCreator
my_driver = 'firefox' # SUBSTITUTE DRIVER YOU WANT (options below)
lc = ListCreator(driver=my_driver, scroll_pause_time=0.8)
lc.create_list_for(url='https://www.youtube.com/user/schafer5')
lc.create_list_for(url='https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ', log_silently=True)
# Set `log_silently` to `True` to mute program logging to the console.
# The program will log the prgram status and any program information
# to only the log file for the channel being scraped
# (this is useful when scraping multiple channels at once with multi-threading).
# By default, the program logs to both the log file for the channel being scraped AND the console.
# to name the file using the channel ID instead of the channel name, set file_name='id'
# this is useful when scraping multiple channels with the same name:
lc.create_list_for(url='https://www.youtube.com/channel/UCb2EYjrzI6WpNAmPZeihhag', file_name='id')
lc.create_list_for(url='https://www.youtube.com/channel/UCDzYhlGOvGqsYw8IaTKDT8g', file_name='id')
# for more details about this method:
help(lc.create_list_for)
# see the new files that were just created:
import os
os.system('ls -lt | head') # MacOS/Linux
os.system('dir /O-D | find "_videos_list"') # Windows
# for more information on using the module:
help(lc)
driver
options include:
'firefox'
'opera'
'safari'
(MacOS only)'chrome'
'brave'
'edge'
(Windows only!)scroll_pause_time
for laggy internet and decrease scroll_pause_time
for fast internetAdd the url to every channel you want to extract information from in a txt
file with every url placed on a new line.
channels.txt
(NOTE this is a relative link, so this might not link properly on non-GitHub hosted sites!)Enter the python interpreter:
python3 # MacOS/Linux
python # Windows
from yt_videos_list import ListCreator
lc = ListCreator(driver='firefox', scroll_pause_time=1.2)
lc.create_list_from(path_to_channel_urls_file='channels.txt', number_of_threads=4)
# configuring settings:
lc.create_list_from(
path_to_channel_urls_file='channels.txt',
number_of_threads=4,
min_sleep=1,
max_sleep=5,
after_n_channels_pause_for_s=(20, 10),
log_subthread_status_silently=False,
log_subthread_info_silently=False
) # defaults (keyword argument form)
lc.create_list_from('channels.txt', 4, 1, 5, (20, 10), False, False) # defaults (positional argument form)
lc.create_list_from('channels.txt', min_sleep=3, max_sleep=10) # modifying only min_sleep and max_sleep
help(lc.create_list_from) # see API method details
Ideal if you use Selenium for other projects 😎
yt-videos-list
package installed (follow directions above for getting set up), then run the following:pip3 install -U yt-videos-list # MacOS/Linux: ensure latest package
python3 # MacOS/Linux: enter python interpreter
pip install -U yt-videos-list # Windows: ensure latest package
python # Windows: enter python interpreter
from yt_videos_list.download import selenium_webdriver_dependencies
selenium_webdriver_dependencies.download_all()
That's all! 🤓
NOTE that you can also access all the information below from the Python interpreter by entering
import yt_videos_list
help(yt_videos_list)
# default options for the ListCreator instance
ListCreator(
txt=True,
csv=True,
md=True,
file_suffix=True,
all_video_data_in_memory=False,
video_data_returned=False,
video_id_only=False,
reverse_chronological=True,
headless=False,
scroll_pause_time=0.8,
driver='firefox',
cookie_consent=False,
verify_page_bottom_n_times=3,
file_buffering=-1,
)
There are a number of optional arguments you can specify during the instantiation of the ListCreator instance. The preceding arguments are run by default, but in case you want more flexibility, you can specify the:
driver
argument:
driver='firefox'
driver='opera'
driver='safari'
driver='chrome'
driver='brave'
driver='edge'
cookie_consent
argument:
False
(default) - block all cookie options if prompted by YouTube (at consent.youtube.com)True
- accept all cookie options if prompted by YouTube (also at consent.youtube.com)
cookie_consent=False
(default) OR cookie_consent=True
txt
, csv
, md
file type argument:
True
(default) - create a file for the specified typeFalse
- do not create a file for the specified type
txt=True
(default) OR txt=False
csv=True
(default) OR csv=False
md=True
(default) OR md=False
file_suffix
argument:
True
(default) - add a file suffix to the output file name
ChannelName_reverse_chronological_videos_list.csv
ChannelName_chronological_videos_list.csv
False
- do NOT add a file suffix to the output file name
ChannelName.csv
(reverse chronological output file)ChannelName.csv
(chronological output file)
-> file_suffix=True
(default) OR file_suffix=False
all_video_data_in_memory
argument:
False
(default) - do not scrape the entire pageTrue
- scrape the entire page (must ALSO set the video_data_returned
attribute to True
to return this data!)
all_video_data_in_memory=False
(default) OR all_video_data_in_memory=True
video_data_returned
argument:
False
(default) - do not return video data collected from the current scrape job (return dummy data instead: [[0, '', '', '']]
)True
- return video data collected from the current scrape job
all_video_data_in_memory
attribute set to False
, the returned data MIGHT not be the full data, and video numbering MIGHT be incorrectall_video_data_in_memory
attribute to True
to return ALL video data for channel (video number will then also ALWAYS be correct)
video_data_returned=False
(default) OR video_data_returned=True
video_id_only
argument:
False
(default) - include the full URL to video: https://www.youtube.com/watch?v=ElevenChars
True
- include only the identifier parameter to video: ElevenChars
video_id_only=False
(default) OR video_id_only=True
reverse_chronological
argument:
True
(default) - write the files in order from most recent video to the oldest videoFalse
- write the files in order from oldest video to the most recent video
reverse_chronological=True
(default) OR reverse_chronological=False
headless
argument:
False
(default) - run the driver with an open Selenium instance for viewingTrue
- run the driver in "invisible" mode
headless=False
(default) OR headless=True
scroll_pause_time
argument:
0
(default 0.8
)
scroll_pause_time=0.8
(default)verify_page_bottom_n_times
argument:
0
(defaults to 3
)scroll_pause_time
argument.
scrioll_pause_time
value to 1.0
:
-> your_time / scroll_pause_time
-> 45 / 1.0
-> 45
-> therefore: verify_page_bottom_n_times=45
page_bottom_n_times=3
should be sufficient.file_buffering
argument:
int
values greater than 0
(default -1
, which uses the default OS setting)scrapetube
integrationscrapetube
is a much more efficient backend developer tool that loads the videos uploaded by a channel. This package also supports loading information from playlists and searches, which yt-videos-list
currently does not do. Integration with scrapetube
will be available in a future yt-videos-list
release!
To keep things backwards-compatible and maintainable, the scrapetube
integration will be accessible through an almost identical, separate interface as the ListCreator
interface, and the original ListCreator
interface will continue to be available and continue to receive updates. 🤓
To clone the repository and install the most updated version of the package that may not yet be available on the latest release through PyPI, clone this repository and run:
cd yt_videos_list/python # MacOS/Linux
python3 -m pip install . # MacOS/Linux
cd yt_videos_list\python # Windows
python -m pip install . # Windows
To make your own changes to the yt_videos_list
python package and run the changes locally:
# make changes to the codebase in the
# ===> /dev <=== directory
python3 minifier.py # MacOS/Linux
pip3 install . # MacOS/Linux
python minifier.py # Windows
pip install . # Windows
NOTE: make the changes to the codebase in the yt_videos_list/python/dev
directory!!
yt_videos_list/python/yt-videos-list
directory is minified with
yt_videos_list/python/yt-videos-list
directory is NOT human readable, and the yt_videos_list/python/dev
directory should be used for development instead!
minifier.py
module performs all the code preprocessing and packages the code from yt_videos_list/python/dev
into the final version seen in the yt_videos_list/python/yt-videos-list
directoryminifier.py
before installing the local package with pip install .
(Windows) or pip3 install .
is essential!The tests use the custom ThreadWithResult
subclass of threading.Thread
provided by the save-thread-result
package, so make sure you install that module using
pip3 install -U save-thread-result # MacOS/Linux
pip install -U save-thread-result # Windows
# if that doesn't work:
python3 -m pip install -U save-thread-result # MacOS/Linux
python -m pip install -U save-thread-result # Windows
Then, make sure you're in the yt_videos_list/python
directory, then run:
tests\run_tests.bat # Windows
#### Any shell on MacOS/Linux
bash tests/run_tests.sh # this works
csh tests/run_tests.sh # this works
dash tests/run_tests.sh # this works
ksh tests/run_tests.sh # this also works
tcsh tests/run_tests.sh # this works too
zsh tests/run_tests.sh # this works as well
# you can try other shells and
# they should work too, since
# there's no special syntax in
# the run_tests.sh file
FAQs
YouTube bot to make a YouTube videos list (including all video titles and URLs uploaded by a channel) with end-to-end web scraping - no API tokens required. 🌟 Star this repo if you found it useful! 🌟
We found that yt-videos-list demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.