New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

PyWebScraping

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

PyWebScraping

PyWebScraping is a Python library for browser automation and web scraping. It supports Chrome, Firefox, Edge, and Yandex, providing a consistent API for managing browser sessions, options, and common actions like scrolling, element interaction, and JavaScript execution. It also facilitates remote webdriver control.

  • 1.3.13
  • PyPI
  • Socket score

Maintainers
1

PyWebScraping: A Python Library for Browser Automation

PyWebScraping simplifies interaction with web browsers for scraping and automation tasks. It currently supports Chrome, Firefox, Edge, and Yandex browsers, providing a consistent interface for managing browser sessions, handling options, and performing common actions.

Key Features:

  • Cross-Browser Support: Seamlessly work with Chrome, Firefox, Edge, and Yandex browsers using a unified API.
  • Remote WebDriver Control: Connect to and manage existing browser sessions remotely.
  • Headless Browsing: Execute tasks discreetly in the background without a visible browser window.
  • Proxy Support: Integrate proxies for managing network requests.
  • User Agent Spoofing: Customize the user agent string for various browser impersonations.
  • Window Management: Control window size, position, and manage multiple tabs/windows.
  • Simplified API: Perform common actions like scrolling, hovering, finding elements, and executing JavaScript.

Installation:

  • With pip:

    pip install PyWebScraping
    
  • With git:

    pip install git+https://github.com/oddshellnick/PyWebScraping.git
    

API Reference:

  • BaseDriver: Provides fundamental classes like EmptyWebDriver, BrowserOptionsManager, BrowserStartArgs, and BrowserWebDriver for core browser management functionality.
  • ChromeDriver/EdgeDriver/FirefoxDriver/YandexDriver: Contains specific implementations for each browser, including options management, startup argument handling, and remote webdriver connection classes.
  • browsers_handler: Includes helper classes like WindowRect for managing window dimensions and get_installed_browsers/get_browser_version for retrieving system browser information.

Modules Overview:

  • EmptyWebDriver: A base class offering essential methods for interacting with a webdriver.
  • BrowserOptionsManager: Base class for managing browser-specific options. Subclassed for each browser type.
  • BrowserStartArgs: Base class for managing browser startup arguments. Subclassed for each browser.
  • BrowserWebDriver: Base class for managing the lifecycle of a webdriver instance. Subclassed for each browser.
  • Chrome(Remote)WebDriver, Edge(Remote)WebDriver, Firefox(Remote)WebDriver, Yandex(Remote)WebDriver: Concrete implementations for managing local and remote sessions for each browser.

This library aims to simplify browser automation in Python. Contributions and feedback are welcome!

Usage Examples:

Starting a Chrome Webdriver:

from PyWebScraping.webdrivers.Chrome import ChromeWebDriver
from PyWebScraping.utilities import WindowRect

webdriver = ChromeWebDriver(webdriver_path="/path/to/chromedriver", window_rect=WindowRect(0, 0, 800, 600))
webdriver.start_webdriver(headless_mode=True)
webdriver.driver.get("https://www.example.com")
# ... perform actions ...
webdriver.close_webdriver()

Connecting to a Remote Chrome Instance:

from PyWebScraping.webdrivers.Chrome import ChromeRemoteWebDriver

command_executor, session_id =  # ... obtain from your running remote webdriver instance ...
remote_webdriver = ChromeRemoteWebDriver(command_executor, session_id)
remote_webdriver.create_driver()
# ...Interact with the remote browser...
remote_webdriver.close_webdriver()

Working with Edge (similar for Firefox and Yandex with their respective classes):

from PyWebScraping.webdrivers.Edge import EdgeWebDriver

webdriver = EdgeWebDriver(webdriver_path="/path/to/msedgedriver")
webdriver.start_webdriver()
# ... interact ...
webdriver.close_webdriver()

Future Notes

PyWebScraping is under active development. Planned future enhancements include support for additional browsers, advanced interaction features, and improved handling of dynamic web content. Contributions and suggestions for new features are welcome! Feel free to open issues or submit pull requests on the project's repository.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc