Web Scraping Framework based on py3 asyncio
CrawlerDetect is a Python library designed to identify bots, crawlers, and spiders by analyzing their user agents.
crawler commons
Framework for crawling
Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters.
Class that provides decorators and functions for easy handling of crawlera sessions in a scrapy spider.
Tools for Spiders
Core programs for crawling
A web Crawler for DTC(dans ton chat), VDM(vie de merde) and SCMB(se coucher moins bete)
An image crawler, including multiple modules and GUI.
采集工具
Automate downloads using predefined sites and the My JDownloader API
A distributed crawler framework based on Python
Python implementation Bloom filter
Python Test Crawler
Crawlera middleware for Scrapy
Clark University, Package for YouTube crawler and cleaning data
An app to download novels from online sources and generate e-books.
crawler_studio
Proxy rotation with PostgreSQL
A sample Crawler API
Scrapy utils for Modis crawlers projects.
SELENIUM CRAWLER FOR SCRAPING BILLING DATA FROM AMOCRM PARTNER CABINET
Open source tool to display/filter/export information about PCI or PCI Express devices, as well as their topology.
a group of crawlers for private tracker website
template tools
A common package for crawling
A client to interact with freud-net API
this is an aparat crawler library
异步高并发dblp爬虫,慎用
Asynchronous high-concurrency citation crawler, use with caution!
This is a web application that extracts images URLs from web pages.
A sample Crawler API
DataLad extension package for crawling external web resources into an automated data distribution
CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.
Crawl your personal favorite images, photo albums, comics from website. Support pixiv, yande.re for now.
A simple web-crawling framework, based on aiohttp.
Crawl telegra.ph searching for nudes!
A new crawler for LinkAhead
Command-line program to download image galleries and collections from several image hosting sites
Video Crawler