Sunspot
Sunspot is a Ruby library for expressive, powerful interaction with the Solr
search engine. Sunspot is built on top of the RSolr library, which
provides a low-level interface for Solr interaction; Sunspot provides a simple,
intuitive, expressive DSL backed by powerful features for indexing objects and
searching for them.
Sunspot is designed to be easily plugged in to any ORM, or even non-database-backed
objects such as the filesystem.
This README provides a high level overview; class-by-class and
method-by-method documentation is available in the API
reference.
Quickstart with Rails 3
Add to Gemfile:
gem 'sunspot_rails'
gem 'sunspot_solr'
Bundle it!
bundle install
Generate a default configuration file:
rails generate sunspot_rails:install
If sunspot_solr
was installed, start the packaged Solr distribution
with:
bundle exec rake sunspot:solr:start
Setting Up Objects
Add a searchable
block to the objects you wish to index.
class Post < ActiveRecord::Base
searchable do
text :title, :body
text :comments do
comments.map { |comment| comment.body }
end
boolean :featured
integer :blog_id
integer :author_id
integer :category_ids, :multiple => true
double :average_rating
time :published_at
time :expired_at
string :sort_title do
title.downcase.gsub(/^(an?|the)/, '')
end
end
end
text
fields will be full-text searchable. Other fields (e.g.,
integer
and string
) can be used to scope queries.
Searching Objects
Post.search do
fulltext 'best pizza'
with :blog_id, 1
with(:published_at).less_than Time.now
order_by :published_at, :desc
paginate :page => 2, :per_page => 15
facet :category_ids, :author_id
end
Search In Depth
Given an object Post
setup in earlier steps ...
Full Text
Post.search { fulltext 'pizza' }
Post.search do
fulltext 'pizza' do
boost_fields :title => 2.0
end
end
Post.search do
fulltext 'pizza' do
boost(2.0) { with(:featured, true) }
end
end
Post.search do
fulltext 'pizza' do
fields(:title)
end
end
Post.search do
fulltext 'pizza' do
fields(:body, :title => 2.0)
end
end
Phrases
Solr allows searching for phrases: search terms that are close together.
In the default query parser used by Sunspot (dismax), phrase searches
are represented as a double quoted group of words.
Post.search do
fulltext '"great pizza"'
end
If specified, query_phrase_slop sets the number of words that may
appear between the words in a phrase.
Post.search do
fulltext '"great pizza"' do
query_phrase_slop 1
end
end
Phrase Boosts
Phrase boosts add boost to terms that appear in close proximity;
the terms do not have to appear in a phrase, but if they do, the
document will score more highly.
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
end
end
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
phrase_slop 1
end
end
Scoping (Scalar Fields)
Fields not defined as text
(e.g., integer
, boolean
, time
,
etc...) can be used to scope (restrict) queries before full-text
matching is performed.
Positive Restrictions
Post.search do
with(:blog_id, 1)
end
Post.search do
with(:average_rating, 3.0..5.0)
end
Post.search do
with(:category_ids, [1, 3, 5])
end
Post.search do
with(:published_at).greater_than(1.week.ago)
end
Negative Restrictions
Post.search do
without(:category_ids, [1, 3])
end
Disjunctions and Conjunctions
Post.search do
any_of do
with(:expired_at).greater_than(Time.now)
with(:expired_at, nil)
end
end
Post.search do
all_of do
with(:blog_id, 1)
with(:author_id, 2)
end
end
Disjunctions and conjunctions may be nested
Post.search do
any_of do
with(:blog_id, 1)
all_of do
with(:blog_id, 2)
with(:category_ids, 3)
end
end
end
Combined with Full-Text
Scopes/restrictions can be combined with full-text searching. The
scope/restriction pares down the objects that are searched for the
full-text term.
Post.search do
with(:blog_id, 1)
fulltext("pizza")
end
All results from Solr are paginated
The results array that is returned has methods mixed in that allow it to
operate seamlessly with common pagination libraries like will_paginate
and kaminari.
By default, Sunspot requests the first 30 results from Solr.
search = Post.search do
fulltext "pizza"
end
results = search.results
search.total
results.total_pages
results.first_page?
results.last_page?
results.previous_page
results.next_page
results.out_of_bounds?
results.offset
To retrieve the next page of results, recreate the search and use the
paginate
method.
search = Post.search do
fulltext "pizza"
paginate :page => 2
end
results = search.results
search.total
results.total_pages
results.first_page?
results.last_page?
results.previous_page
results.next_page
results.out_of_bounds?
results.offset
A custom number of results per page can be specified with the
:per_page
option to paginate
:
search = Post.search do
fulltext "pizza"
paginate :page => 1, :per_page => 50
end
Faceting
Faceting is a feature of Solr that determines the number of documents
that match a given search and an additional criterion. This allows you
to build powerful drill-down interfaces for search.
Each facet returns zero or more rows, each of which represents a
particular criterion conjoined with the actual query being performed.
For field facets, each row represents a particular value for a given
field. For query facets, each row represents an arbitrary scope; the
facet itself is just a means of logically grouping the scopes.
Field Facets
search = Post.search do
fulltext "pizza"
facet :author_id
end
search.facet(:author_id).rows.each do |facet|
puts "Author #{facet.value} has #{facet.count} pizza posts!"
end
Query Facets
search = Post.search do
facet(:average_rating) do
row(1.0..2.0) do
with(:average_rating, 1.0..2.0)
end
row(2.0..3.0) do
with(:average_rating, 2.0..3.0)
end
row(3.0..4.0) do
with(:average_rating, 3.0..4.0)
end
row(4.0..5.0) do
with(:average_rating, 4.0..5.0)
end
end
end
search.facet(:average_rating).rows.each do |facet|
puts "Number of posts with rating withing #{facet.value}: #{facet.count}"
end
Ordering
By default, Sunspot orders results by "score": the Solr-determined
relevancy metric. Sorting can be customized with the order_by
method:
Post.search do
fulltext("pizza")
order_by(:average_rating, :desc)
end
Post.search do
fulltext("pizza")
order_by(:score, :desc)
order_by(:average_rating, :desc)
end
Post.search do
fulltext("pizza")
order_by(:random)
end
Grouping
Solr 3.3 and above
Solr supports grouping documents, similar to an SQL GROUP BY
. More
information about result grouping/field collapsing is available on the
Solr Wiki.
Grouping is only supported on string
fields that are not
multivalued. To group on a field of a different type (e.g., integer),
add a denormalized string
type
class Post < ActiveRecord::Base
searchable do
string(:blog_id_str) { |p| p.blog_id.to_s }
end
end
search = Post.search do
group :blog_id_str
end
search.group(:blog_id_str).matches
search.group(:blog_id_str).groups.each do |group|
puts group.value
group.results.each do |result|
end
end
Additional options are supported by the DSL:
Post.search do
group :blog_id_str do
limit 3
end
end
Post.search do
group :blog_id_str do
order_by(:average_rating, :desc)
end
end
Post.search do
group :blog_id_str do
truncate
end
facet :blog_id_str, :extra => :any
end
Geospatial
Experimental and unreleased. The DSL may change.
Sunspot 2.0 supports geospatial features of Solr 3.1 and above.
Geospatial features require a field defined with latlon
:
class Post < ActiveRecord::Base
searchable do
latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) }
end
end
Filter By Radius
Post.search do
with(:location).in_radius(32, -68, 100)
end
Filter By Radius (inexact with bbox)
Post.search do
with(:location).in_radius(32, -68, 100, :bbox => true)
end
Filter By Bounding Box
Post.search do
with(:location).in_bounding_box([45, -94], [46, -93])
end
Sort By Distance
Post.search do
order_by_geodist(:location, 32, -68)
end
Highlighting
Highlighting allows you to display snippets of the part of the document
that matched the query.
The fields you wish to highlight must be stored.
class Post < ActiveRecord::Base
searchable do
text :body, :stored => true
end
end
Highlighting matches on the body
field, for instance, can be acheived
like:
search = Post.search do
fulltext "pizza" do
highlight :body
end
end
search.hits.each do |hit|
puts "Post ##{hit.primary_key}"
hit.highlights(:body).each do |highlight|
puts " " + highlight.format { |word| "*#{word}*" }
end
end
Functions
TODO
More Like This
Sunspot can extract related items using more_like_this. When searching
for similar items, you can pass a block with the following options:
- fields :field_1[, :field_2, ...]
- minimum_term_frequency ##
- minimum_document_frequency ##
- minimum_word_length ##
- maximum_word_length ##
- maximum_query_terms ##
- boost_by_relevance true/false
class Post < ActiveRecord::Base
searchable do
text :body, :more_like_this => true
end
end
post = Post.first
results = Sunspot.more_like_this(post) do
fields :body
minimum_term_frequency 5
end
Indexing In Depth
TODO
Index-Time Boosts
To specify that a field should be boosted in relation to other fields for
all queries, you can specify the boost at index time:
class Post < ActiveRecord::Base
searchable do
text :title, :boost => 5.0
text :body
end
end
Stored Fields
Stored fields keep an original (untokenized/unanalyzed) version of their
contents in Solr.
Stored fields allow data to be retrieved without also hitting the
underlying database (usually an SQL server). They are also required for
highlighting and more like this queries.
Stored fields come at some performance cost in the Solr index, so use
them wisely.
class Post < ActiveRecord::Base
searchable do
text :body, :stored => true
end
end
Post.search.hits.each do |hit|
puts hit.stored(:body)
end
Hits vs. Results
Sunspot simply stores the type and primary key of objects in Solr.
When results are retrieved, those primary keys are used to load the
actual object (usually from an SQL database).
Post.search.results.each do |result|
puts result.body
end
To access information about the results without querying the underlying
database, use hits
:
Post.search.hits.each do |hit|
puts hit.stored(:body)
end
If you need both the result (ORM-loaded object) and Hit
(e.g., for
faceting, highlighting, etc...), you can use the convenience method
each_hit_with_result
:
Post.search.each_hit_with_result do |hit, result|
end
Reindexing Objects
If you are using Rails, objects are automatically indexed to Solr as a
part of the save
callbacks.
If you make a change to the object's "schema" (code in the searchable
block),
you must reindex all objects so the changes are reflected in Solr:
bundle exec rake sunspot:solr:reindex
bundle exec rake sunspot:solr:reindex[500,Post]
Use Without Rails
TODO
Manually Adjusting Solr Parameters
To add or modify parameters sent to Solr, use adjust_solr_params
:
Post.search do
adjust_solr_params do |params|
params[:q] += " AND something_s:more"
end
end
Session Proxies
TODO
Type Reference
TODO
Development
Running Tests
sunspot
Install the required gem dependencies:
cd /path/to/sunspot/sunspot
bundle install
Start a Solr instance on port 8983:
bundle exec sunspot-solr start -p 8983
Run the tests:
bundle exec rake spec
If desired, stop the Solr instance:
bundle exec sunspot-solr stop
sunspot_rails
Install the gem dependencies for sunspot
:
cd /path/to/sunspot/sunspot
bundle install
Start a Solr instance on port 8983:
bundle exec sunspot-solr start -p 8983
Navigate to the sunspot_rails
directory:
cd ../sunspot_rails
Run the tests:
rake spec
rake spec RAILS=3.1.1
If desired, stop the Solr instance:
cd ../sunspot
bundle exec sunspot-solr stop
Generating Documentation
Install the yard
and redcarpet
gems:
$ gem install yard redcarpet
Uninstall the rdiscount
gem, if installed:
$ gem uninstall rdiscount
Generate the documentation from topmost directory:
$ yardoc -o docs */lib/**/*.rb - README.md
Tutorials and Articles
License
Sunspot is distributed under the MIT License, copyright (c) 2008-2009 Mat Brown