Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

sitemapcrawler

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sitemapcrawler

A simple sitemap crawler that acts as the backbone for other operations

  • 0.0.1
  • PyPI
  • Socket score

Maintainers
1

Sitemap Crawler

sitemapcrawler is a simple, blocking Python Crawler that is the backbone of a few other projects.

You're welcome to use it, but it's only as modular as we've needed it to be, which is to say, probably not fit for projects that aren't built with this in mind.

It works pretty simply.

Installation

pip install sitemapcrawler

Usage

from sitemapcrawler import Crawler
crawler = Crawler(domain="https://yourdomain.com", sitemap="https://yourdomain.com/sitemap.xml", fetch=True)
crawler.run()

If you just want to fetch a given page, create an instance of the crawler and call it like this:

crawler.fetch_page(url="https://yourdomain.com/blog/title")

The init will create a nanoid crawl_id so that when results are persisted, they'll be associated to a given crawl, to make it easy for reports to be built against crawls and such.

Building / Distributing

python3 -m build
python3 -m twine upload dist/* --skip-existing

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc