Pywebcapture
A package that allows users to capture full-page screenshots of websites using Selenium and Chrome webdriver.
Tested with Python version 3.8.3
Installation
- Download the latest version of Chrome webdriver
- Add chrome webdriver path to your system PATH (its also possible to pass the absolute path of your driver to the Driver instance)
- Run
pip install pywebcapture
Basic Usage
Import the modules:
from pywebcapture import loader, driver
Use the CSVLoader to load your csv file containing the urls and optional file names:
Options:
- input_filepath - The absolute path to your csv file (str)
- has_header - Whether your csv has a header row or now (bool)
- uri_column - The column that contains the uri's, can use either column name (str) or the index position (int)
- filename_column - The column that contains the desired file names (str), can be set to None, where the driver will use the uri netloc as the filename
csv_file = loader.CSVLoader("example.csv", True, 3, None)
Call the get_uri_dict() method from the CSVLoader instance, this parses the CSV into a Python dictionary:
uri_dict = csv_file.get_uri_dict()
Create instance of the web driver:
Options:
- driver_path - This is the absolute path to the chrome webdriver, if None or "chromedriver" it will attempt to search %PATH
- output_path - This is the output path that you want to save screen shots at (str)
- delay - This is the delay in seconds between each page request, minimum is 2 seconds, please crawl pages respectfully :)
- uri_dict - The Python dictionary containing your file names and uri's
d = driver.Driver("path/to/chrome/webdriver", None, 3, uri_dict)
Run the driver, this will loop through all uri's, get the maximum scrollheight and then take a screenshot
d.run()