web_crawler

0.5.4
Rubygems

Version published: 13 years ago

Maintainers: 1

Created: 13 years ago

Source

Web crawler help you with parse and collect data from the web

==How it works.

class StackoverflowCrawler < WebCrawler::Base

target "http://stackoverflow.com/questions/tagged/:tag", :tag=> %w{ruby ruby-on-rails ruby-on-rails-3}
logger "path/to/log/file" # or Logger.new(...)

cache_to '/tmp/cache/stackoverflow'

context "#questions .question-summary", :jobs do

  #TODO: defaults :format => lambda{ |v| v.to_i }

  map '.vote-count-post strong', :to => :vote_count, :format => lambda{ |v| v.to_i }
  map '.views', :to => :view_count, :format => lambda{ |v| v.match(/\d+/)[0].to_i }
  map '.status strong', :to => :answer_count, :format => lambda{ |v| v.to_i }
  map '.summary h3 a', :to => :title, :format => lambda{ |v| v.to_i }
  map '.summary .excerpt', :to => :excerpt, :format => lambda{ |v| v.to_i }
  map '.user-action-time .relativetime', :to => :posted_at, :on => [:attr, :title]
  map '.tags .post-tag', :to => :tags, :format => lambda{ |v| v.to_i }

end

end

#TODO

Add documentation
...
PROFIT!!!1 (:

FAQs

What is web_crawler?

Is web_crawler well maintained?

Package last updated on 24 Jun 2011

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

web_crawler

Related posts

Ultralytics PyPI Package Compromised Through GitHub Actions Cache Poisoning

Malicious Maven Package Impersonating 'XZ for Java' Library Introduces Backdoor Allowing Remote Code Execution