Ranunculus
Another Web crawler running with Amazon SQS and ElastiCache(Redis).
You need a worker which is available at GitHub.
Installation
Add this line to your application's Gemfile:
gem 'ranunculus'
And then execute:
$ bundle
Or install it yourself as:
$ gem install ranunculus
Usage
require 'aws-sdk'
require 'redis'
require 'ranunculus'
sqs = Aws::SQS::Client.new(
access_key_id: ENV['ACCESS_KEY_ID'],
secret_access_key: ENV['SECRET_ACCESS_KEY'],
region: 'ap-northeast-1'
)
redis = Redis.new(:host => "localhost", :port => 6379, :db => 0)
URL_QUEUE_URL = "https://sqs.ap-northeast-1.amazonaws.com/12345/UrlQueue"
RESULT_QUEUE_URL = "https://sqs.ap-northeast-1.amazonaws.com/12345/ResultQueue"
c = Ranunculus::Crawler.new(sqs, redis, URL_QUEUE_URL, RESULT_QUEUE_URL)
page = Ranunculus::Page.new("http://www.yahoo.com")
c.on_every_page do |page|
puts page.url
end
c.start(page, 1, "Ranunculus")
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/k-kawa/ranunculus-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.