====================================
ScrapyDD (Scrapy Distributed Daemon)
.. image:: https://img.shields.io/pypi/v/scrapydd.svg
:target: https://pypi.python.org/pypi/scrapydd
:alt: PyPI Version
.. image:: https://img.shields.io/travis/kevenli/scrapydd/master.svg
:target: http://travis-ci.org/kevenli/scrapydd
:alt: Build Status
.. image:: https://img.shields.io/codecov/c/github/kevenli/scrapydd/master.svg
:target: http://codecov.io/gh/kevenli/scrapydd?branch=master
:alt: Coverage report
Scrapydd is a system for scrapy spiders distributed running and scheduleing system, including server and client agent.
Advantages:
- Distributed, easily add runner(agent) to scale out.
- Project requirements auto install on demand.
- Cron expression time driven trigger, run your spider on time.
- Webhook loosely couple the data crawling and data processing.
- Spider status insight, system will look into the log to clarify spider run status.
Installing Scrapydd
By pip::
pip install scrapydd
You can also install scrapydd manually:
- Download compressed package from
github releases
_. - Decompress the package
- Run
python setup.py install
Run Scrapydd Server
scrapydd server
The server default serve on 0.0.0.0:6800, with both api and web ui.
Add --daemon parameter in commmand line to run in background.
Run Scrapydd Agent
scrapydd agent
Add --daemon parameter in commmand line to run in background.
Docs
The docs is hosted here
_
Docker-Compose
.. code-block:: yaml
version: '3'
services:
db:
image: mysql
command: --default-authentication-plugin=mysql_native_password
restart: always
ports:
- "3306:3306"
volumes:
- "datadb:/var/lib/mysql"
environment:
MYSQL_ROOT_PASSWORD: mysqlPassword
MYSQL_DATABASE: scrapydd
MYSQL_USER: scrapydd
MYSQL_PASSWORD: scrapyddPwd
server:
image: "kevenli/scrapydd"
ports:
- "6800:6800"
volumes:
- "./server:/scrapydd"
- "/var/run/docker.sock:/var/run/docker.sock"
command: scrapydd server
agent:
image: "kevenli/scrapydd"
volumes:
- "./agent:/scrapydd"
- "/var/run/docker.sock:/var/run/docker.sock"
links:
- server
environment:
- SCRAPYDD_SERVER=server
command: scrapydd agent
volumes:
datadb:
.. _here: http://scrapydd.readthedocs.org
.. _github releases
: https://github.com/kevenli/scrapydd/releases