Elasticsearch::Extensions::Documents
A service wrapper to manage Elasticsearch index documents. Built on the
elasticsearch-ruby
Gem.
Installation
Add this line to your application's Gemfile:
gem 'elasticsearch-documents'
And then execute:
$ bundle
Or install it yourself as:
$ gem install elasticsearch-documents
Configuration
Before making any calls to Elasticsearch you need to configure the Documents
extension. Configuration options namespaced under 'client' are passed through to
Elasticsearch::Client
.
ES_MAPPINGS = {
user: {
_all: { analyzer: "snowball" },
properties: {
id: { type: "integer", index: :not_analyzed },
name: { type: "string", analyzer: "snowball" },
bio: { type: "string", analyzer: "snowball" },
updated_at: { type: "date", include_in_all: false }
}
}
}
ES_SETTINGS = {
index: {
number_of_shards: 3,
number_of_replicas: 2,
}
}
Elasticsearch::Extensions::Documents.configure do |config|
config.index_name = 'test_index'
config.mappings = ES_MAPPINGS
config.settings = ES_SETTINGS
config.client.url = 'http://example.com:9200'
config.client.logger = Rails.logger
end
If you are using this extension with a Rails application this configuration
could live in an initializer like config/initializers/elasticsearch.rb
.
Usage
The Documents
extension builds on the elasticsearch-ruby
Gem
adding conventions and helper classes to aide in the serialization and flow of
data between your application code and the elasticsearch-ruby client. To
accomplish this the application data models will be serialized into instances
of Document
classes. These Document
instances are then indexed and searched
with wrappers around the elasticsearch-ruby
client.
Saving a Document
Assume your application has a User
model. To index the User
records you
define a Document
that maps User
records to a search index mapping.
class UserDocument < Elasticsearch::Extensions::Documents::Document
indexes_as_type :user
def as_hash
{
name: object.name,
title: object.title,
bio: object.bio,
}
end
end
user = User.new
user_doc = UserDocument.new(user)
index = Elasticsearch::Extensions::Documents::Index.new
index.index(user_doc)
Deleting a Document
Deleting a document is just as easy
user_doc = UserDocument.new(user)
index.delete(user_doc)
Searching for Documents
Create classes which include Elasticsearch::Extensions::Documents::Queryable
.
Then implement a #as_hash
method to define the JSON structure of an
Elasticsearch Query using the Query DSL. This hash should be
formatted appropriately to be passed on to the
Elasticsearch::Transport::Client#search method.
class GeneralSiteSearchQuery
include Elasticsearch::Extensions::Documents::Queryable
def as_hash
{
index: 'test_index',
body: {
query: {
query_string: {
analyzer: "snowball",
query: "something to search for",
}
}
}
}
end
end
You could elaborate on this class with a constructor that takes the search
term and other options specific to your use case as arguments. The impoortant
part is to define the #as_hash
method.
You can then call the #execute
method to run the query. The Elasticsearch JSON
response will be returned in whole wrapped in a
Hashie::Mash
instance to allow
the results to be interacted with in object notation instead of hash notation.
query = GeneralSiteSearchQuery.new
results = query.execute
results.hits.total
results.hits.max_score
results.hits.hits.each { |hit| puts hit._source }
You can also easily define a custom result format by overriding the
#parse_results
method in your Queryable class.
class GeneralSiteSearchQuery
include Elasticsearch::Extensions::Documents::Queryable
def as_hash
end
def parse_results(raw_results)
CustomQueryResults.new(raw_results)
end
end
Here the CustomQueryResults
gets passed the Hashie::Mash
results object and
can parse and coerce that data into whatever structure is most useful for your
application.
Index Management
The Indexer uses the Elasticsearch::Extensions::Documents.configuration
to create the index with the configured #index_name
, #mappings
, and
#settings
.
indexer = Elasticsearch::Extensions::Documents::Indexer.new
indexer.create_index
indexer.drop_index
The Indexer
can #bulk_index
documents sending multiple documents to
Elasticsearch in a single request. This may be more efficient when
programmatically re-indexing entire sets of documents.
user_documents = users.collect { |user| UserDocument.new(user) }
indexer.bulk_index(user_documents)
The Indexer
accepts a block to the #reindex
method to encapsulate the
processes of dropping the old index, creating a new index with the latest
configured mappings and settings, and bulk indexing a set of documents into the
newly created index. The content of the block should be the code that creates
your documents in batches and passes them to the #bulk_index
method of the
Indexer
.
indexer.reindex do |indexer|
User.find_in_batches(batch_size: 500) do |batch|
documents = batch.map { |user| UserDocument.new(user) }
indexer.bulk_index(documents)
end
documents = users.map { |model| UserDocument.new(model) }
indexer.bulk_index(documents)
end
By default the call to #reindex
will create the index if it does not yet
exist. If the index already exists it will be left in place and the documents
provided to be indexed will be added or updated as needed. You can force the
index to be dropped and recreated during the reindex process by passing the
force_create: true
option:
indexer.reindex(force_create: true) do |indexer|
end
Different reindexing strategies may be added in the future to allow "zero
downtime reindexing". This could be accomplished using index names with a
timestamp appended and index aliases.
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request