Socket
Book a DemoInstallSign in
Socket

anemone

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

anemone

bundlerRubygems
Version
0.7.2
Version published
Maintainers
1
Created
Source

= Anemone

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See http://anemone.rubyforge.org for more information.

== Features

  • Multi-threaded design for high performance
  • Tracks 301 HTTP redirects
  • Built-in BFS algorithm for determining page depth
  • Allows exclusion of URLs based on regular expressions
  • Choose the links to follow on each page with focus_crawl()
  • HTTPS support
  • Records response time for each page
  • CLI program can list all pages in a domain, calculate page depths, and more
  • Obey robots.txt
  • In-memory or persistent storage of pages during crawl, using TokyoCabinet, SQLite3, MongoDB, or Redis

== Examples See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.

== Requirements

  • nokogiri
  • robots

== Development To test and develop this gem, additional requirements are:

  • rspec
  • fakeweb
  • tokyocabinet
  • kyotocabinet-ruby
  • mongo
  • redis
  • sqlite3

You will need to have KyotoCabinet, {Tokyo Cabinet}[http://fallabs.com/tokyocabinet/], {MongoDB}[http://www.mongodb.org/], and {Redis}[http://code.google.com/p/redis/] installed on your system and running.

FAQs

Package last updated on 30 May 2012

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts