Web Scraping Framework based on py3 asyncio
CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.
crawler commons
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML.
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters.
Framework for crawling
Class that provides decorators and functions for easy handling of crawlera sessions in a scrapy spider.
Tools for Spiders
Core programs for crawling
A web Crawler for DTC(dans ton chat), VDM(vie de merde) and SCMB(se coucher moins bete)
A shared library for web scraping utilities.
Python Test Crawler
Automate downloads using predefined sites and the My-JDownloader-API
Crawlera middleware for Scrapy
采集工具
An image crawler, including multiple modules and GUI.
Cryptocurrency exchange announcement news crawler for major crypto exchanges
Download lightnovels from various online sources and generate output in different formats, e.g. epub, mobi, json, html, text, docx, pdf etc.
Python implementation Bloom filter
A distributed crawler framework based on Python
A client implementation of Firefox DevTools over remote debug protocol.
Command-line program to download image galleries and collections from several image hosting sites
crawler utils
Proxy rotation with PostgreSQL
Clark University, Package for YouTube crawler and cleaning data
crawler_studio
Browser fingerprint datapoints collected by Apify
A client to interact with freud-net API
Open source tool to display/filter/export information about PCI or PCI Express devices, as well as their topology.
A sample Crawler API
template tools
Python SDK for WebCrawler API
this is an aparat crawler library
Scrapy utils for Modis crawlers projects.
Crawl your personal favorite images, photo albums, comics from website. Support pixiv, yande.re for now.
异步高并发dblp爬虫,慎用
Python package to query DeFi data from several The Graph subgraphs
DataLad extension package for crawling external web resources into an automated data distribution
A common package for crawling
A small example package
Crawl telegra.ph searching for nudes!
CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.