search_flip
Full-Featured Elasticsearch Ruby Client with a Chainable DSL
Using SearchFlip it is dead-simple to create index classes that correspond to
Elasticsearch indices and to manipulate, query and
aggregate these indices using a chainable, concise, yet powerful DSL. Finally,
SearchFlip supports Elasticsearch 2.x, 5.x, 6.x, 7.x. Check section
Feature Support for version dependent features.
CommentIndex.search("hello world", default_field: "title").where(visible: true).aggregate(:user_id).sort(id: "desc")
CommentIndex.aggregate(:user_id) do |aggregation|
aggregation.aggregate(histogram: { date_histogram: { field: "created_at", interval: "month" }})
end
CommentIndex.range(:created_at, gt: Date.today - 1.week, lt: Date.today).where(state: ["approved", "pending"])
Updating from previous SearchFlip versions
Checkout UPDATING.md for detailed instructions.
Comparison with other gems
There are great ruby gems to work with Elasticsearch like e.g. searchkick and
elasticsearch-ruby already. However, they don't have a chainable API. Compare
yourself.
Comment.search(
query: {
query_string: {
query: "hello world",
default_operator: "AND"
}
}
)
Comment.search("hello world", where: { available: true }, order: { id: "desc" }, aggs: [:username])
CommentIndex.search("hello world").where(available: true).sort(id: "desc").aggregate(:username)
Finally, SearchFlip comes with a minimal set of dependencies.
Reference Docs
SearchFlip has a great documentation.
Check youself at http://www.rubydoc.info/github/mrkamel/search_flip
Install
Add this line to your application's Gemfile:
gem 'search_flip'
and then execute
$ bundle
or install it via
$ gem install search_flip
Config
You can change global config options like:
SearchFlip::Config[:environment] = "development"
SearchFlip::Config[:base_url] = "http://127.0.0.1:9200"
Available config options are:
index_prefix
to have a prefix added to your index names automatically. This
can be useful to separate the indices of e.g. testing and development environments.base_url
to tell SearchFlip how to connect to your clusterbulk_limit
a global limit for bulk requestsbulk_max_mb
a global limit for the payload of bulk requestsauto_refresh
tells SearchFlip to automatically refresh an index after
import, index, delete, etc operations. This is e.g. useful for testing, etc.
Defaults to false.
Usage
First, create a separate class for your index and include SearchFlip::Index
.
class CommentIndex
include SearchFlip::Index
end
Then tell the Index about the index name, the corresponding model and how to
serialize the model for indexing.
class CommentIndex
include SearchFlip::Index
def self.index_name
"comments"
end
def self.model
Comment
end
def self.serialize(comment)
{
id: comment.id,
username: comment.username,
title: comment.title,
message: comment.message
}
end
end
Optionally, you can specify a custom type_name
, but note that starting with
Elasticsearch 7, types are deprecated.
class CommentIndex
def self.type_name
"comment"
end
end
You can additionally specify an index_scope
which will automatically be
applied to scopes, eg. ActiveRecord::Relation objects, passed to #import
,
#index
, etc. This can be used to preload associations that are used when
serializing records or to restrict the records you want to index.
class CommentIndex
def self.index_scope(scope)
scope.preload(:user)
end
end
CommentIndex.import(Comment.all)
To specify a custom mapping:
class CommentIndex
def self.mapping
{
properties: {
}
}
end
end
Please note that you need to specify the mapping without a type name, even for
Elasticsearch versions before 7, as SearchFlip will add the type name
automatically if neccessary.
To specify index settings:
def self.index_settings
{
settings: {
number_of_shards: 10,
number_of_replicas: 2
}
}
end
Then you can interact with the index:
CommentIndex.create_index
CommentIndex.index_exists?
CommentIndex.delete_index
CommentIndex.update_mapping
CommentIndex.close_index
CommentIndex.open_index
Index records (automatically uses the Bulk API):
CommentIndex.import(Comment.all)
CommentIndex.import(Comment.first)
CommentIndex.import([Comment.find(1), Comment.find(2)])
CommentIndex.import(Comment.where("created_at > ?", Time.now - 7.days))
Query records:
CommentIndex.total_entries
CommentIndex.search("title:hello").records
CommentIndex.where(username: "mrkamel").total_entries
CommentIndex.aggregate(:username).aggregations(:username)
...
Please note that you can check the request that will be send to Elasticsearch
by calling #request
on the query:
CommentIndex.search("hello world").sort(id: "desc").aggregate(:username).request
Delete records:
CommentIndex.match_all.delete
CommentIndex.bulk do |indexer|
CommentIndex.match_all.find_each do |record|
indexer.delete record.id
end
end
When indexing or deleting documents, you can pass options to control the bulk
indexing and you can use all options provided by the Bulk API:
CommentIndex.import(Comment.first, { bulk_limit: 1_000 }, op_type: "create", routing: "routing_key")
CommentIndex.create(Comment.first, { bulk_max_mb: 100 }, routing: "routing_key")
CommentIndex.update(Comment.first, ...)
Checkout the Elasticsearch Bulk API docs for more info as well as
SearchFlip::Bulk
for a complete list of available options to control the bulk indexing of
SearchFlip.
Working with Elasticsearch Aliases
You can use and manage Elasticsearch Aliases like the following:
class UserIndex
include SearchFlip::Index
def self.index_name
alias_name
end
def self.alias_name
"users"
end
end
Then, create an index, import the records and add the alias like:
new_user_index = UserIndex.with_settings(index_name: "users-#{SecureRandom.hex}")
new_user_index.create_index
new_user_index.import User.all
new_user.connection.update_aliases(actions: [
add: { index: new_user_index.index_name, alias: new_user_index.alias_name }
])
If the alias already exists, you have to remove it as well first within update_aliases
.
Please note: with_settings(index_name: '...')
returns an anonymous (i.e.
temporary) class which inherits from UserIndex and overwrites index_name
.
Chainable Methods
SearchFlip supports even more advanced usages, like e.g. post filters, filtered
aggregations or nested aggregations via simple to use API methods.
Query/Filter Criteria Methods
SearchFlip provides powerful methods to query/filter Elasticsearch:
The .where
method feels like ActiveRecord's where
and adds a bool filter clause to the request:
CommentIndex.where(reviewed: true)
CommentIndex.where(likes: 0 .. 10_000)
CommentIndex.where(state: ["approved", "rejected"])
The .where_not
method is like .where
, but excluding the matching documents:
CommentIndex.where_not(id: [1, 2, 3])
Use .range
to add a range filter query:
CommentIndex.range(:created_at, gt: Date.today - 1.week, lt: Date.today)
Use .filter
to add raw filter queries:
CommentIndex.filter(term: { state: "approved" })
Use .should
to add raw should queries:
CommentIndex.should([
{ term: { state: "approved" } },
{ term: { user: "mrkamel" } },
])
Use .must
to add raw must queries:
CommentIndex.must(term: { state: "approved" })
Like must
, but excluding the matching documents:
CommentIndex.must_not(term: { state: "approved" })
Adds a query string query, with AND as default operator:
CommentIndex.search("hello world")
CommentIndex.search("state:approved")
CommentIndex.search("username:a*")
CommentIndex.search("state:approved OR state:rejected")
CommentIndex.search("hello world", default_operator: "OR")
Use exists
to add an exists
query:
CommentIndex.exists(:state)
Like exists
, but excluding the matching documents:
CommentIndex.exists_not(:state)
Simply matches all documents:
CommentIndex.match_all
Simply matches none documents at all:
CommentIndex.match_none
Simply returns the criteria as is or an empty criteria when called on the index
class directly. Useful for chaining.
CommentIndex.all
Sometimes, you want to convert the constraints of a search flip query to a raw
query to e.g. use it in a should clause:
CommentIndex.should([
CommentIndex.range(:likes_count, gt: 10).to_query,
CommentIndex.search("search term").to_query
])
It returns all added queries and filters, including post filters as a raw
query:
CommentIndex.where(state: "new").search("text").to_query
Post Query/Filter Criteria Methods
All query/filter criteria methods (#where
, #where_not
, #range
, etc.) are available
in post filter mode as well, ie. filters/queries applied after aggregations
are calculated. Checkout the Elasticsearch docs for further info.
query = CommentIndex.aggregate(:user_id)
query = query.post_where(reviewed: true)
query = query.post_search("username:a*")
Checkout PostFilterable
for a complete API reference.
Aggregations
SearchFlip allows to elegantly specify nested aggregations, no matter how deeply
nested:
query = OrderIndex.aggregate(:username, order: { revenue: "desc" }) do |aggregation|
aggregation.aggregate(revenue: { sum: { field: "price" }})
end
Generally, aggregation results returned by Elasticsearch are returned as a
SearchFlip::Result
, which basically is a Hash with method-like access, such
that you can access them via:
query.aggregations(:username)["mrkamel"].revenue.value
Still, if you want to get the raw aggregations returned by Elasticsearch,
access them without supplying any aggregation name to #aggregations
:
query.aggregations
query.aggregations["username"]["buckets"].detect { |bucket| bucket["key"] == "mrkamel" }["revenue"]["value"]
Once again, the criteria methods (#where
, #range
, etc.) are available in
aggregations as well:
query = OrderIndex.aggregate(average_price: {}) do |aggregation|
aggregation = aggregation.match_all
aggregation = aggregation.where(user_id: current_user.id) if current_user
aggregation.aggregate(average_price: { avg: { field: "price" }})
end
query.aggregations(:average_price).average_price.value
Even various criteria for top hits aggregations can be specified elegantly:
query = ProductIndex.aggregate(sponsored: { top_hits: {} }) do |aggregation|
aggregation.sort(:rank).highlight(:title).source([:id, :title])
end
Checkout Aggregatable
as well as Aggregation
for a complete API reference.
Suggestions
query = CommentIndex.suggest(:suggestion, text: "helo", term: { field: "message" })
query.suggestions(:suggestion).first["text"]
Highlighting
CommentIndex.highlight([:title, :message])
CommentIndex.highlight(:title).highlight(:description)
CommentIndex.highlight(:title, require_field_match: false)
CommentIndex.highlight(title: { type: "fvh" })
query = CommentIndex.highlight(:title).search("hello")
query.results[0]._hit.highlight.title
Other Criteria Methods
There are even more chainable criteria methods to make your life easier. For a
full list, checkout the reference docs.
In case you want to restrict the returned fields, simply specify
the fields via #source
:
CommentIndex.source([:id, :message]).search("hello world")
SearchFlip supports
will_paginate and
kaminari compatible pagination. Thus,
you can either use #paginate
or #page
in combination with #per
:
CommentIndex.paginate(page: 3, per_page: 50)
CommentIndex.page(3).per(50)
Use #profile
to enable query profiling:
query = CommentIndex.profile(true)
query.raw_response["profile"]
preload
, eager_load
and includes
Uses the well known methods from ActiveRecord to load
associated database records when fetching the respective
records themselves. Works with other ORMs as well, if
supported.
Using #preload
:
CommentIndex.preload(:user, :post).records
PostIndex.includes(comments: :user).records
or #eager_load
CommentIndex.eager_load(:user, :post).records
PostIndex.eager_load(comments: :user).records
or #includes
CommentIndex.includes(:user, :post).records
PostIndex.includes(comments: :user).records
Used to fetch and yield records in batches using the ElasicSearch scroll API.
The batch size and scroll API timeout can be specified.
CommentIndex.search("hello world").find_in_batches(batch_size: 100) do |batch|
end
Used like find_in_batches
, but yielding the raw results (as
SearchFlip::Result
objects) instead of database records.
CommentIndex.search("hello world").find_results_in_batches(batch_size: 100) do |batch|
end
Like #find_in_batches
but yielding one record at a time.
CommentIndex.search("hello world").find_each(batch_size: 100) do |record|
end
Like #find_results_in_batches
, but yielding one record at a time.
CommentIndex.search("hello world").find_each_result(batch_size: 100) do |batch|
end
You can as well use the underlying scroll API directly, ie. without using higher
level scrolling:
query = CommentIndex.scroll(timeout: "5m")
until query.records.empty?
query = query.scroll(id: query.scroll_id, timeout: "5m")
end
Use #failsafe
to prevent any exceptions from being raised for query string
syntax errors or Elasticsearch being unavailable, etc.
CommentIndex.search("invalid/request").execute
CommentIndex.search("invalid/request").failsafe(true).execute
You can merge criterias, ie. combine the attributes (constraints, settings,
etc) of two individual criterias:
CommentIndex.where(approved: true).merge(CommentIndex.search("hello"))
Specify a timeout to limit query processing time:
CommentIndex.timeout("3s").execute
Specify a http timeout for the request which will be send to Elasticsearch:
CommentIndex.http_timeout(3).execute
Activate early query termination to stop query processing after the specified
number of records has been found:
CommentIndex.terminate_after(10).execute
For further details and a full list of methods, check out the reference docs.
You can add a custom clause to the request via custom
CommentIndex.custom(custom_clause: '...')
This can be useful for Elasticsearch features not yet supported via criteria
methods by SearchFlip, custom plugin clauses, etc.
Custom Criteria Methods
To add custom criteria methods, you can add class methods to your index class.
class HotelIndex
def self.where_geo(lat:, lon:, distance:)
filter(geo_distance: { distance: distance, location: { lat: lat, lon: lon } })
end
end
HotelIndex.search("bed and breakfast").where_geo(lat: 53.57532, lon: 10.01534, distance: '50km').aggregate(:rating)
Using multiple Elasticsearch clusters
To use multiple Elasticsearch clusters, specify a connection within your
indices:
MyConnection = SearchFlip::Connection.new(base_url: "http://elasticsearch.host:9200")
class MyIndex
include SearchFlip::Index
def self.connection
MyConnection
end
end
This allows to use different clusters per index e.g. when migrating indices to
new versions of Elasticsearch.
You can specify basic auth, additional headers, request timeouts, etc via:
http_client = SearchFlip::HTTPClient.new
http_client = http_client.basic_auth(user: "username", pass: "password")
http_client = http_client.auth("Bearer VGhlIEhUVFAgR2VtLCBST0NLUw")
http_client = http_client.via("proxy.host", 8080)
http_client = http_client.headers(key: "value")
http_client = http_client.timeout(20)
SearchFlip::Connection.new(base_url: "...", http_client: http_client)
AWS Elasticsearch / Signed Requests
To use SearchFlip with AWS Elasticsearch and signed requests, you have to add
aws-sdk-core
to your Gemfile and tell SearchFlip to use the
SearchFlip::AwsSigv4Plugin
:
require "search_flip/aws_sigv4_plugin"
MyConnection = SearchFlip::Connection.new(
base_url: "https://your-elasticsearch-cluster.es.amazonaws.com",
http_client: SearchFlip::HTTPClient.new(
plugins: [
SearchFlip::AwsSigv4Plugin.new(
region: "...",
access_key_id: "...",
secret_access_key: "..."
)
]
)
)
Again, in your index you need to specify this connection:
class MyIndex
include SearchFlip::Index
def self.connection
MyConnection
end
end
Routing and other index-time options
Override index_options
in case you want to use routing or pass other
index-time options:
class CommentIndex
include SearchFlip::Index
def self.index_options(comment)
{
routing: comment.user_id,
version: comment.version,
version_type: "external_gte"
}
end
end
These options will be passed whenever records get indexed, deleted, etc.
Instrumentation
SearchFlip supports instrumentation for request execution via
ActiveSupport::Notifications
compatible instrumenters to e.g. allow global
performance tracing, etc.
To use instrumentation, configure the instrumenter:
SearchFlip::Config[:instrumenter] = ActiveSupport::Notifications.notifier
Subsequently, you can subscribe to notifcations for request.search_flip
:
ActiveSupport::Notifications.subscribe("request.search_flip") do |name, start, finish, id, payload|
payload[:index]
payload[:request]
payload[:response]
end
A notification will be send for every request that is sent to Elasticsearch.
Non-ActiveRecord models
SearchFlip ships with built-in support for ActiveRecord models, but using
non-ActiveRecord models is very easy. The model must implement a find_each
class method and the Index class needs to implement Index.record_id
and
Index.fetch_records
. The default implementations for the index class are as
follows:
class MyIndex
include SearchFlip::Index
def self.record_id(object)
object.id
end
def self.fetch_records(ids)
model.where(id: ids)
end
end
Thus, if your ORM supports .find_each
, #id
and #where
you are already
good to go. Otherwise, simply add your custom implementation of those methods
that work with whatever ORM you use.
JSON
SearchFlip is using the Oj gem to generate
JSON. More concretely, SearchFlip is using:
Oj.dump({ key: "value" }, mode: :custom, use_to_json: true, time_format: :xmlschema, bigdecimal_as_decimal: false)
The use_to_json
option is used for maximum compatibility, most importantly
when using rails ActiveSupport::TimeWithZone
timestamps, which oj
can not
serialize natively. However, use_to_json
adds performance overhead. You can
change the json options via:
SearchFlip::Config[:json_options] = {
mode: :custom,
use_to_json: false,
time_format: :xmlschema,
bigdecimal_as_decimal: false
}
However, you then have to convert timestamps manually for indexation via e.g.:
class MyIndex
def self.serialize(model)
{
created_at: model.created_at.to_time
}
end
end
Please check out the oj docs for more details.
Feature Support
- for Elasticsearch 2.x, the delete-by-query plugin is required to delete
records via queries
#match_none
is only available with Elasticsearch >= 5#track_total_hits
is only available with Elasticsearch >= 7
Keeping your Models and Indices in Sync
Besides the most basic approach to get you started, SearchFlip currently doesn't
ship with any means to automatically keep your models and indices in sync,
because every method is very much bound to the concrete environment and depends
on your concrete requirements. In addition, the methods to achieve model/index
consistency can get arbitrarily complex and we want to keep this bloat out of
the SearchFlip codebase.
class Comment < ActiveRecord::Base
include SearchFlip::Model
notifies_index(CommentIndex)
end
It uses after_commit
(if applicable, after_save
, after_destroy
and
after_touch
otherwise) hooks to synchronously update the index when your
model changes.
Semantic Versioning
SearchFlip is using Semantic Versioning: SemVer
Links
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Running the test suite
Running the tests is super easy. The test suite uses sqlite, such that you only
need to install Elasticsearch. You can install Elasticsearch on your own, or
you can e.g. use docker-compose:
$ cd search_flip
$ sudo ES_IMAGE=elasticsearch:5.4 docker-compose up
$ rspec
That's it.