New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

PyWebScraping

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

PyWebScraping

PyWebScraping is a Python library for browser automation and web scraping. It supports Chrome, Firefox, Edge, and Yandex, providing a consistent API for managing browser sessions, options, and common actions like scrolling, element interaction, and JavaScript execution. It also facilitates remote webdriver control.

1.3.13
PyPI

Maintainers: 1

PyWebScraping: A Python Library for Browser Automation

PyWebScraping simplifies interaction with web browsers for scraping and automation tasks. It currently supports Chrome, Firefox, Edge, and Yandex browsers, providing a consistent interface for managing browser sessions, handling options, and performing common actions.

Key Features:

Cross-Browser Support: Seamlessly work with Chrome, Firefox, Edge, and Yandex browsers using a unified API.
Remote WebDriver Control: Connect to and manage existing browser sessions remotely.
Headless Browsing: Execute tasks discreetly in the background without a visible browser window.
Proxy Support: Integrate proxies for managing network requests.
User Agent Spoofing: Customize the user agent string for various browser impersonations.
Window Management: Control window size, position, and manage multiple tabs/windows.
Simplified API: Perform common actions like scrolling, hovering, finding elements, and executing JavaScript.

Installation:

With pip:

pip install PyWebScraping

With git:

pip install git+https://github.com/oddshellnick/PyWebScraping.git

API Reference:

BaseDriver: Provides fundamental classes like EmptyWebDriver, BrowserOptionsManager, BrowserStartArgs, and BrowserWebDriver for core browser management functionality.
ChromeDriver/EdgeDriver/FirefoxDriver/YandexDriver: Contains specific implementations for each browser, including options management, startup argument handling, and remote webdriver connection classes.
browsers_handler: Includes helper classes like WindowRect for managing window dimensions and get_installed_browsers/get_browser_version for retrieving system browser information.

Modules Overview:

EmptyWebDriver: A base class offering essential methods for interacting with a webdriver.
BrowserOptionsManager: Base class for managing browser-specific options. Subclassed for each browser type.
BrowserStartArgs: Base class for managing browser startup arguments. Subclassed for each browser.
BrowserWebDriver: Base class for managing the lifecycle of a webdriver instance. Subclassed for each browser.
Chrome(Remote)WebDriver, Edge(Remote)WebDriver, Firefox(Remote)WebDriver, Yandex(Remote)WebDriver: Concrete implementations for managing local and remote sessions for each browser.

This library aims to simplify browser automation in Python. Contributions and feedback are welcome!

Usage Examples:

Starting a Chrome Webdriver:

from PyWebScraping.webdrivers.Chrome import ChromeWebDriver
from PyWebScraping.utilities import WindowRect

webdriver = ChromeWebDriver(webdriver_path="/path/to/chromedriver", window_rect=WindowRect(0, 0, 800, 600))
webdriver.start_webdriver(headless_mode=True)
webdriver.driver.get("https://www.example.com")
# ... perform actions ...
webdriver.close_webdriver()

Connecting to a Remote Chrome Instance:

from PyWebScraping.webdrivers.Chrome import ChromeRemoteWebDriver

command_executor, session_id =  # ... obtain from your running remote webdriver instance ...
remote_webdriver = ChromeRemoteWebDriver(command_executor, session_id)
remote_webdriver.create_driver()
# ...Interact with the remote browser...
remote_webdriver.close_webdriver()

Working with Edge (similar for Firefox and Yandex with their respective classes):

from PyWebScraping.webdrivers.Edge import EdgeWebDriver

webdriver = EdgeWebDriver(webdriver_path="/path/to/msedgedriver")
webdriver.start_webdriver()
# ... interact ...
webdriver.close_webdriver()

Future Notes

PyWebScraping is under active development. Planned future enhancements include support for additional browsers, advanced interaction features, and improved handling of dynamic web content. Contributions and suggestions for new features are welcome! Feel free to open issues or submit pull requests on the project's repository.

FAQs

What is PyWebScraping?

Is PyWebScraping well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

PyWebScraping

PyWebScraping: A Python Library for Browser Automation

Key Features:

Installation:

API Reference:

Modules Overview:

Usage Examples:

Future Notes

Related posts

Go Supply Chain Attack: Malicious Package Exploits Go Module Proxy Caching for Persistence

Socket Joins TC54 to Help Shape the Future of SBOMs, CycloneDX, and PURL