Socket
Book a DemoInstallSign in
Socket

urlscrub

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

urlscrub

Tool for parsing URL webpage into JSON + RDF.

pipPyPI
Version
0.1.0
Maintainers
1

URL Scrub

Tool for parsing URL webpage into JSON + RDF.

Setup

Dependencies

Installation Process

  • Install urlscrub with pip

    python3.10 -m pip install urlscrub
    
  • Install geckodriver

    • Download Firefox and install.

      • Linux (Ubuntu):

        sudo apt-get install firefox
        
    • Download geckodriver.zip.

    • Unzip geckodriver/geckodriver.exe file into a preferred directory.

    • Append the directory containing geckodriver to your PATH variable. (Guide)

  • Install chromedriver

    • Download Google Chrome and install.

    • Find the version of Google Chrome you have installed.

      • Open Google Chrome web browser.

      • Click on 3 vertical dots at top right. (Picture)

      • At the bottom of the dropdown, select Help, then About Google Chrome. (Picture)

      • Remember the version number displayed (Picture; Ex: 102.0.5005.115)

    • Download chromedriver.zip with the most corresponding version number.

      • Exact version number not required (Ex: chromedriver 102.0.5005.61 w/ Google Chrome 102.0.5005.115)
    • Unzip chromedriver/chromedriver.exe file into a preferred directory.

    • Append the directory containing chromedriver to your PATH variable. (Guide)

Command Line Usage

  • Command:

    urlscrub --skip-cookies --driver "chrome" -l "https://www.amazon.com/All-new-Kindle-Oasis-now-with-adjustable-warm-light/dp/B07GRSK3HC"
    
  • Response:

    {
      "results": [
        {
          "type": "product",
          "productTitle": "Kindle Oasis \u2013 With adjustable warm light",
          "availability": "In Stock.",
          "rating": "19,734 ratings",
          "imageURL": "https://m.media-amazon.com/images/I/614TlIaYBvL._AC_SX679_.jpg"
        }
      ]
    }
    

Guides

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts