Web Scraping Framework based on py3 asyncio
CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.
crawler commons
Crawlera middleware for Scrapy
Framework for crawling
Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters.
Class that provides decorators and functions for easy handling of crawlera sessions in a scrapy spider.
Tools for Spiders
Automate downloads using predefined sites and the My JDownloader API
Core programs for crawling
A web Crawler for DTC(dans ton chat), VDM(vie de merde) and SCMB(se coucher moins bete)
采集工具
Python Test Crawler
An image crawler, including multiple modules and GUI.
Open source tool to display/filter/export information about PCI or PCI Express devices, as well as their topology.
A distributed crawler framework based on Python
Python implementation Bloom filter
An app to download novels from online sources and generate e-books.
crawler_studio
Proxy rotation with PostgreSQL
A sample Crawler API
Notion news mecro
this is an aparat crawler library
A common package for crawling
Crawl telegra.ph searching for nudes!
Command-line program to download image galleries and collections from several image hosting sites
template tools
A sample Crawler API
CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.
Scrapy utils for Modis crawlers projects.
Clark University, Package for YouTube crawler and cleaning data
This is a web application that extracts images URLs from web pages.
Load, analyze, move, and manipulate GitLab data
A client to interact with freud-net API
异步高并发dblp爬虫,慎用
crawler utils
Video Crawler
Asynchronous high-concurrency citation crawler, use with caution!
A simple and efficient web crawler in Python.
DataLad extension package for crawling external web resources into an automated data distribution
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page
Help you to build web crawlers easily and quickly