CassandraStore
CassandraStore is a fun to use ORM for Cassandra with a chainable,
ActiveStore like DSL for querying, inserting, updating and deleting records
plus built-in migration support. It is built on-top of the cassandra-driver
gem, using its built-in automated paging what is drastically reducing the
complexity of the code base.
Install
Add this line to your application's Gemfile:
gem 'cassandra-store'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cassandra-store
Usage
Connecting
First and foremost, you need to connect to your cassandra cluster like so:
CassandraStore::Base.configure(
hosts: ["127.0.0.1"],
keyspace: "my_keyspace",
cluster_settings: { consistency: :quorum }
)
When using rails, you want to do that in an initializer. If you do not yet have
a keyspace, you additionally want to pass replication
settings:
CassandraStore::Base.configure(
hosts: ["127.0.0.1"],
keyspace: "my_keyspace",
cluster_settings: { consistency: :quorum },
replication: { class: 'SimpleStrategy', replication_factor: 1 }
)
Afterwards, you can create/drop the specified keyspace:
rake cassandra:keyspace:create
rake cassandra:keyspace:drop
Migrations
If you are on rails and you don't have any tables yet, you can add migrations
now. There is no generator yet, so you have to create them manually:
class CreatePosts < CassandraStore::Migration
def up
execute <<-CQL
CREATE TABLE posts (
user TEXT,
domain TEXT,
id TIMEUUID,
message TEXT,
PRIMARY KEY ((user, domain), id)
)
CQL
end
def down
execute "DROP TABLE posts"
end
end
Afterwards, simply run rake cassandra:migrate
.
Models
Creating models couldn't be easier:
class Post < CassandraStore::Base
column :user, :text, partition_key: true
column :domain, :text, partition_key: true
column :id, :timeuuid, clustering_key: true
column :message, :text
validates_presence_of :user, :domain, :message
before_create do
self.id ||= generate_timeuuid
end
end
Let's check this out in detail:
column :user, :text, partition_key: true
column :domain, :text, partition_key: true
tells CassandraStore that your partition key is comprised of the user
column
as well as the domain
column. For more information regarding partition keys
and the data model of cassandra, please check out the cassandra docs. Afterwards,
the clustering/sorting key is specified via:
column :id, :timeuuid, clustering_key: true
The id
is assigned here:
self.id ||= generate_timeuuid
Please note, CassandraStore never auto-assigns any values for you, but you
have to assign them. You can pass a timestamp to generate_timeuuid
as well:
generate_timeuuid(Time.now)
This is desirable when you have timestamp columns as well and you want them
to match with your timeuuid key.
Similarly, when using UUID
instead of TIMEUUID
you have to use
generate_uuid
instead.
In addition, you can of course use all kinds of validations, hooks, etc.
Querying
The interface for dealing with records and querying them is very similar
to the interface of ActiveRecord
:
Post.create!(user: "mrkamel", ...)
Post.create(...)
Post.new(...).save
Post.new(...).save!
Post.first.delete
Post.first.destroy
CassandraStore supports comprehensive query methods in a chainable way:
Post.all
Post.where(user: "mrkamel", domain: "example.com")
Post.where_cql("user = :user", user: "mrkamel")
Post.where(...).limit(10)
Post.where(...).order(id: "asc")
Post.select(:user, :domain).distinct
Post.select(:user, :domain)
Please note, when using select
in the end an array of hashes will be returned
instead of an array of Post
objects.
Post.where(...).count
Post.where(...).first
Post.where(...).find_each(batch_size: 100) do |post|
end
Post.where(...).find_in_batches(batch_size: 100) do |batch|
end
Post.where(...).update_all("message = 'test'")
Post.where(...).update_all(message: "test")
Post.where(...).delete_all
Please note, that delete_in_batches
will run find_in_batches
iteratively
and then delete each batch. When dealing with large amounts of records to
delete you usually want to use delete_in_batches
instead of delete_all
, as
delete_all
can time out.
Post.where(...).delete_in_batches
Again, please note, that delete_in_batches
will run find_in_batches
iteratively
and then delete each batch. When dealing with large amounts of records to
delete you usually want to use delete_in_batches
instead of delete_all
, as
delete_all
can time out.
Post.truncate_table
Deletes all records from the table. This is much faster than delete_all
or
delete_in_batches
. However, it is not chainable, such that your only option
is to remove all records from the table.
Semantic Versioning
CassandraStore is using Semantic Versioning: SemVer
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request