RDig provides an HTTP crawler and content extraction utilities to help building a site search for web sites or intranets. Internally, Ferret is used for the full text indexing. After creating a config file for your site, the index can be built with a single call to rdig.

RDig depends on Ferret (>= 0.10.0) and, for parsing HTML, on either Hpricot (>= 0.4) or the RubyfulSoup library (>= 1.0.4). As I know no way to specify such an OR dependency in a gem specification, the gem depends on Hpricot. If this is a problem for you, install the gem with --force and manually do a +gem install rubyful_soup+.

== basic usage

=== Index creation

create a config file based on the template in doc/examples
to create an index: rdig -c CONFIGFILE
to run a query against the index (just to try it out) rdig -c CONFIGFILE -q 'your query' this will dump the first 10 search results to STDOUT

=== Handle search in your application: require 'rdig' require 'rdig_config' # load your config file here search_results = RDig.searcher.search(query)

see RDig::Search::Searcher for more information.

== usage in rails

add to config/environment.rb : require 'rdig' require 'rdig_config'
place rdig_config.rb into config/ directory.
build index: rdig -c config/rdig_config.rb
in your controller that handles the search form: search_results = RDig.searcher.search(params[:query]) @results = search_results[:list] @hitcount = search_results[:hitcount]

=== search result paging Use the :first_doc and :num_docs options to implement paging through search results. (:num_docs is 10 by default, so without using these options only the first 10 results will be retrieved)

== sample configuration

from doc/examples/config.rb. The tag_selector properties are called with a BeautifulSoup instance as parameter. See the RubyfulSoup Site[http://www.crummy.com/software/RubyfulSoup/documentation.html] for more info about this cool lib. You can also have a look at the +html_content_extractor+ unit test.

:include:doc/examples/config.rb

FAQs

What is rdig?

Is rdig well maintained?

Package last updated on 11 Apr 2012

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

rdig

Related posts

ESLint Adds Support for Parallel Linting, Closing 10-Year-Old Feature Request

Malicious Go Module Disguised as SSH Brute Forcer Exfiltrates Credentials via Telegram