11 packages
bda-util
commonly used tools for bda projects
bottleneckp
asynchronous rate limiter with priority
captcha-recognizer
extracts contents out of a png, jpg or tiff, based on tesseract, an open source c++ ocr engine
crawler
Crawler is a ready-to-use web spider that works with proxies, asynchrony, rate limit, configurable request pools, jQuery, and HTTP/2 support.
flowesh
Flowesh is the non-cluster version of floodesh. It's a middleware based web spider which is lightweight and easy to maintain
mof-genestamp
middleware of floodesh, prints gene and url of a task, along with # of new tasks and # of records
mof-normalizer
middleware of floodesh, which normalizes queue options
mof-reqadapter
middleware of floodesh, which adapts queued options to request options
mof-statsdclient
middleware of floodesh, which provides a statsd client that allows you to send data to statsd server and automatically has some statistics sent that generally apply to all requests.
mof-uarotate
middleware of floodesh, which rotates User-Agent automatically