Web Scraping Framework based on py3 asyncio
CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.
crawler commons
Framework for crawling
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters.
Class that provides decorators and functions for easy handling of crawlera sessions in a scrapy spider.
Tools for Spiders
Core programs for crawling
A web Crawler for DTC(dans ton chat), VDM(vie de merde) and SCMB(se coucher moins bete)
An image crawler, including multiple modules and GUI.
采集工具
Open source tool to display/filter/export information about PCI or PCI Express devices, as well as their topology.
Crawlera middleware for Scrapy
A distributed crawler framework based on Python
Python Test Crawler
Automate downloads using predefined sites and the My-JDownloader-API
Python implementation Bloom filter
An app to download novels from online sources and generate e-books.
A shared library for web scraping utilities.
Clark University, Package for YouTube crawler and cleaning data
Video Crawler
template tools
Proxy rotation with PostgreSQL
crawler_studio
Command-line program to download image galleries and collections from several image hosting sites
A sample Crawler API
Scrapy utils for Modis crawlers projects.
A common package for crawling
Crawl telegra.ph searching for nudes!
this is an aparat crawler library
crawler utils
A sample Crawler API
异步高并发dblp爬虫,慎用
A client to interact with freud-net API
This is a web application that extracts images URLs from web pages.
CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.
SELENIUM CRAWLER FOR SCRAPING BILLING DATA FROM AMOCRM PARTNER CABINET
Python package to query DeFi data from several The Graph subgraphs
a group of crawlers for private tracker website
DataLad extension package for crawling external web resources into an automated data distribution
Asynchronous high-concurrency citation crawler, use with caution!