datastar
"Now witness the power of this FULLY ARMED AND OPERATIONAL DATASTAR!"
npm install datastar --save
Contributing
This module is open source! That means we want your contributions!
- Install it and use it in your project.
- Log bugs, issues, and questions on Github
- Make pull-requests and help make us it better!
Usage
var Datastar = require('datastar');
var datastar = new Datastar({
config: {
user: 'cassandra',
password: 'cassandra',
keyspace: 'a_fancy_keyspace',
hosts: ['127.0.0.1', 'host2', 'host3']
}
}).connect();
var cql = datastar.schema.cql;
var Artist = datastar.define('artist', {
schema: datastar.schema.object({
artist_id: cql.uuid(),
name: cql.text(),
create_date: cql.timestamp({ default: 'create' }),
update_date: cql.timestamp({ default: 'update' }),
members: cql.set(cql.text()),
related_artists: cql.set(cql.uuid()).allow(null),
traits: cql.set(cql.text()),
metadata: cql.map(cql.text(), cql.text()).allow(null)
}).partitionKey('artist_id'),
with: {
compaction: {
class: 'LeveledCompactionStrategy'
}
}
});
Artist.create({
artistId: '12345678-1234-1234-1234-123456789012',
}, function (err, res) {
if (err) return;
});
Warnings
- Define schemas in snakecase, however always use camelcase instead of snakecase everywhere else but schema definitions.
API Documentation
Constructor
All constructor options are passed directly to Priam so any options Priam supports, Datastar supports.
var Datastar = require('datastar');
var datastar = new Datastar({
config: {
user: 'cassandra',
password: 'cassandra',
keyspace: 'a_fancy_keyspace',
hosts: ['127.0.0.1', 'host2', 'host3']
}
});
Connect
Given a set of cassandra information, connect to the cassandra constructor. This must be called before you do any model creation, finding, schema defintion, etc.
var datastar = new Datastar(...);
datastar = datastar.connect();
Define
Define is the primary way to create Models
while using Datastar. See the following long example for an explanation of all options to define. Define your schemas in snake case, but use camel case everywhere else!
var Album = datastar.define('album', {
ensureTables: true,
schema: datastar.schema.object({
album_id: cql.uuid(),
artist_id: cql.uuid(),
name: cql.text(),
track_list: cql.list(cql.text()),
song_list: cql.list(cql.uuid()),
release_date: cql.timestamp(),
create_date: cql.timestamp(),
producer: cql.text()
}).partitionKey('artist_id')
.clusteringKey('album_id'),
with: {
compaction: {
class: 'LeveledCompactionStrategy'
},
gcGraceSeconds: 9860,
orderBy: {
key: 'album_id',
order: 'desc'
}
}
});
Consistency
Since Cassandra is a distributed database, we need a way to specify what our
consistency threshold is for both reading and writing from the database. We can
set consistency
, readConsistency
and writeConsistency
when we define our
model. consistency
is used if you want to set both to be the same threshold
otherwise you can go more granular with readConsistency
and
writeConsistency
. Consistency is defined using a camelCase
string that
corresponds with a consistency that cassandra allows.
var Album = datastar.define('album', {
schema: albumSchema,
readConsistency: 'localQuorum',
writeConsitency: 'one'
});
We also support setting consistency on an operation basis as well if you want to
override the default set on the model for specific cases.
Album.create({
entity: {
albumId: uuid.v4(),
name: 'nevermind'
},
consistency: 'localOne'
}, function (err) {
if (err) return;
});
We also support setting an optional expiration period called TTL
(Time To Live) for data expiration and removal. You can set up the TTL
option either when creating the data entry or updating it, e.g. { ttl: 3 }
means the expiration time for the data is 3 seconds. Once when you update the data entry, it will reset its TTL
.
Album.create({
entity: {
albumId: uuid.v4(),
name: 'nevermind'
},
ttl: 3
}, function (err) {
if (err) return;
});
OR
Album.update({
entity: {
albumId: uuid.v4(),
name: 'whatever'
},
ttl: 5
}, function (err) {
if (err) return;
});
Note: The ttl
option must be set on every update
call.
It is not maintained from the initial entity creation.
If you don't set it in an update
call, the entity will not have a TTL set.
Schema Validation
Schemas are validated on .define. As you call each function on a Model
,
such as create
or find
, the calls to the functions are validated against the
schema. See here for detailed information of supported CQL data types.
Validation is performed using joi.
Notes on null
:
- Use the
.allow(null)
function of joi
on any property you want to allow
to be null when creating your schema
The following table show how certain data types are validated:
CQL Data Type | Validation Type |
---|
ascii | cql.ascii() |
bigint | cql.bigint() |
blob | cql.blob() |
boolean | cql.boolean() |
counter | cql.counter() |
decimal | cql.decimal() |
double | cql.double() |
float | cql.float() |
inet | cql.inet() |
text | cql.text() |
timestamp | cql.timestamp() |
timeuuid | cql.timeuuid() |
uuid | cql.uuid() |
int | cql.int() |
varchar | cql.varchar() |
varint | cql.varint() |
map | cql.map(cql.text(), cql.text()) , |
set | cql.set(cql.text()) |
Lookup tables
This functionality that we built into datastar
exists in order to optimize queries for other unique keys on your main table. By default Cassandra has the ability to do this for you by building an index for that key. The only problem is that the current backend storage of Cassandra can make these very slow and under performant. If this is a high traffic query pattern, this could lead you to having issues with your database. We work around this limitation by simply creating more tables and doing an extra write to the database. Since Cassandra is optimized for handling a heavy write workload, this becomes trivial. We take care of the complexity of keeping these tables in sync for you. Lets look at an example by modifying our Artist
model.
var Artist = datastar.define('artist', {
schema: datastar.schema.object({
artist_id: cql.uuid(),
name: cql.text(),
create_date: cql.timestamp({ default: 'create' }),
update_date: cql.timestamp({ default: 'update' }),
members: cql.set(cql.text()),
related_artists: cql.set(cql.uuid()).allow(null),
traits: cql.set(cql.text()),
metadata: cql.map(cql.text(), cql.text()).allow(null)
}).partitionKey('artist_id')
.lookupKeys('name'),
with: {
compaction: {
class: 'LeveledCompactionStrategy'
}
}
});
In our example above we added name
as a lookupKey
to our Artist
model. This means a few things:
- We must provide a
name
when we create an Artist
. name
as with any lookupKey
MUST be unique- We must provide the
name
when removing an Artist
as it is now the
primary key of a different table. - When updating an
Artist
, a fully formed previous
value must be given or
else an implicit find
operation will happen in order to properly assess if
a lookupKey
has changed.
Keeping these restrictions in mind, we can now have fast lookups by name
without having to worry about too much.
Artist.findOne({
name: 'kurt cobain'
}, function (err, artist) {
if (err) return;
console.log('Fetched artist by name!');
});
Model.create
Once you have created a Model
using datastar.define
you can start creating records against the Cassandra database you have configured in your options or passed to datastar.connect
:
var cql = datastar.schema.cql;
var Beverage = datastar.define('beverage', {
schema: datastar.schema.object({
'beverage_id': cql.uuid({ default: 'v4' }),
'name': cql.text(),
'type': cql.text().allow(null),
'sugar': cql.int(),
'notes': cql.text(),
'otherIngredients': cql.map(cql.text(), cql.text()),
'tags': cql.set(cql.text()),
'siblings': cql.set(cql.uuid())
}).partitionKey('beverage_id'),
});
Beverage.create({
name: 'brawndo',
sugar: 1000000,
notes: "It's got what plants crave"
}, function (err) {
if (err) return;
});
The create
method (like all CRUD methods) will accept four different arguments for convenience:
Model.create(properties);
Model.create({ entity: properties });
Model.create({ entities: properties });
Model.create({ entities [properties, properties2] })
Model.update
Updating records in the database is something that is fairly common. We expose a
simple method to do this where you just provide a partial object representing
your Beverage
and it will figure out how to update all the fields! Lets see what it
looks like.
Beverage.update({
name: 'brawndo',
sugar: 900000000,
notes: "It's got what plants crave, now with more sugar!",
otherIngredients: {
energy: '9001'
},
tags: ['healthy', 'energy', 'gives you wings']
}, function (err) {
if (err) return;
});
It even supports higher level functions on set
and list
types. Lets look at what
set
looks like. (list
covered father down)
Beverage.update({
name: 'brawndo',
tags: {
add: ['amazing', 'invincible'],
remove: ['gives you wings']
}
}, function (err) {
if (err) return;
});
If we decide to create a model that needs to use Lookup Tables
, we require a
previous
value to be passed in as well as the entity
being updated
. If no
previous
value is passed in, we will implicitly run a find
on the primary
table to get the latest record before executing the update. This previous
value is required because we have to detect whether a primaryKey
of a lookup
table has changed.
IMPORTANT:
If you have a case where you are modifying the primaryKey
of a
lookup table and you are PASSING IN the previous value into the update
function, that previous
value MUST be a fully formed object of the previous
record, otherwise you are guaranteed to have the changed lookup table go out of
sync. Passing in your own previous value is done at your own risk if you do not
understand this warning or the implications, please post an issue.
var Person = datastar.define('person', {
ensureTables: true,
schema: datastar.schema.object({
person_id: datastar.schema.cql.uuid({ default: 'v4' }),
name: datastar.schema.cql.text(),
characteristics: datastar.schema.cql.list(datastar.schema.cql.text()),
attributes: datastar.schema.cql.map(
datastar.schema.cql.text(),
datastar.schema.cql.text()
)
}).rename('id', 'person_id')
.lookupKeys('name')
});
var person = {
name: 'Fred Flinstone',
attributes: {
height: '6 foot 1 inch'
}
};
Person.create(person, function (err) {
if (err) return;
});
Person.update({
previous: person,
entity: {
name: 'Barney Rubble',
}
}, function (err) {
if (err) return;
});
Person.remove(previous, function(err) {
if (err) return;
previous.name = 'Barney Rubble';
previous.personId = '12345678-1234-1234-1234-123456789012'
Person.create(previous, function (err) {
if (err) return;
});
});
Person.update({
id: person.id,
name: 'Barney Rubble',
attributes: {
hair: 'blonde'
}
}, function (err) {
if (err) return;
});
Person.update({
id: person.id,
characteristics:['fast', 'hard working']
}, function (err) {
if (err) return;
});
Person.update({
id: person.id,
characteristics: {
prepend: ['lazy']
}
}, function (err) {
if (err) return;
});
Person.update({
id: person.id,
characteristics: {
append: ['helpful']
}
}, function (err) {
if (err) return;
});
Person.update({
id: person.id,
characteristics: {
remove: ['fast']
}
}, function (err) {
if (err) return;
});
Person.update({
id: person.id,
characteristics: {
index: { '1': 'disabled' }
}
}, function (err) {
if (err) return;
});
Model.find
Querying Cassandra can be the source of much pain, which is why datastar
will only allow queries on models based on primary keys. Any post-query filtering is the responsibility of the consumer.
There are four variants to find
:
Model.find(options, callback)
Model.findOne(options || key, callback)
<-- Also has an alias Model.get
Model.findFirst(options, callback)
Model.count(options, callback)
The latter three (findOne/get
, findFirst
, and count
) are all facades to find
and simply do not need the type
parameter in the example below.
Another note here is that type
is implied to be all
if none is given.
Album.find({
type: 'all',
conditions: {
artistId: '00000000-0000-0000-0000-000000000001'
}
}, function (err, result) {
if (err) return;
});
In the latter three facades, you can also pass in conditions
as the options
object!
Album.findOne({
artistId: '00000000-0000-0000-0000-000000000001',
albumId: '00000000-0000-0000-0000-000000000005'
}, function (err, result) {
if (err) return;
});
You only need to pass a separate conditions object when you want to add
additional parameters to the query like LIMIT
. Limit allows us to limit how
many records we are retrieving for any range query
Album.findAll({
conditions: {
artistId: '00000000-0000-0000-0000-000000000001'
},
limit: 1
}, function (err, results) {
if (err) return;
});
We can also just pass in a single key in order to fetch our record. NOTE
This only works if your schema only has a single partition/primary key and
assumes you are passing in that key. This will not work for lookup tables.
Artist.get('00000000-0000-0000-0000-000000000001', function(err, result) {
if (err) return;
});
Stream API
While also providing a standard callback API, the find function supports first
class streams! This is very convenient when doing a findAll
and processing
those records as they come instead of waiting to buffer them all into memory.
For example, if we were doing this inside a request handler:
var through = require('through2');
function handler(req, res) {
Album.findAll({
artistId: '00000000-0000-0000-0000-000000000001'
})
.on('error', function (err) {
res.writeHead(500);
res.end(JSON.stringify({ error: err.message }));
})
.pipe(through.obj(function (bev, enc, callback) {
callback(null, massageBeverage(bev));
}))
.pipe(res);
}
Model.remove
If you would like to remove a record or a set of records from the database, you
just need to pass in the right set of conditions
.
Model.remove(options, callback);
One thing to note here is that when deleting a single record, you can pass in
the fully formed object and we will handle stripping the unsafe parameters that
you cannot query upon based on your defined schema.
Album.remove({
artistId: '00000000-0000-0000-0000-000000000001',
albumId: '00000000-0000-0000-0000-000000000005'
}, function (err) {
if (err) return;
console.log('Successfully removed beverage');
});
This also works on a range of values, given a schema with artistId
as the
partition key and albumId
as the clustering key, like our album
model at the top of
this readme...
Album.remove({
artistId: '00000000-0000-0000-0000-000000000001'
}, function (err) {
if (err) return;
console.log('Successfully removed all albums');
});
When you remove an entity that has an associated lookup table, you need to pass
in both the partition keys of the main table AND the lookup table.
Person.remove({
personId: '12345678-1234-1234-1234-123456789012',
name: 'steve belaruse'
}, function(err) {
if (err) return;
console.log('Successfully removed artist');
});
This is necessary because a lookupKey
defines the partition key of a different
table that is created for lookups.
Model hooks
Arguably one of the most powerful features hidden in datastar
are the model hooks or life-cycle events
that allow you to hook into and modify the execution of a given statement. First let's define the operations and the hooks that they have associated.
Operation | Life-cycle event / model hook |
---|
create, update, remove, ensure-tables | build, execute |
find | all, count, one, first |
datastar
utilizes a module called Understudy
under the hood.
This provides a way to add extensibility to any operation you perform on a
specific model
. Lets take a look at what this could look like.
Beverage.before('create:build', function (options, callback) {
if (options.multiply) {
options.entities = options.entities.map(function (ent) {
return Object.keys(ent).reduce(function (acc, key) {
acc[key] = typeof ent[key] === 'number'
? ent[key] * options.multiply
: ent[key];
return acc;
}, {})
});
}
callback();
});
Beverage.before('create:execute', function (options, callback) {
if (options.commitFast) {
options.statements.consistency('one');
options.statements.strategy = 7;
}
callback();
});
var otherDataCenterConnection = new Priam(connectOpts);
Beverage.after('create:execute', function (options, callback) {
options.statements.connection = otherDataCenterConnection;
options.statements.execute(callback);
});
Beverage.after('find:one', function (result, callback) {
async.map(result.siblings,
function (id, next) {
Beverage.get(id, next);
}, function (err, siblingModels) {
if (err) { return callback(err); }
result.siblings = siblingModels;
callback()
});
});
Beverage.after('find:one', function (result, callback) {
async.each(result.siblings, function (bev, next) {
if (bev.siblings.indexOf(result.beverageId) !== -1) {
return next();
}
var update = bev.toJSON();
update.siblings = {
add: [result.beverageId]
};
bev.siblings.push(result.beverageId);
Beverage.update(update, function (err) {
if (err) { return next(err); }
next(null, bev);
});
}, function (err, res) {
if (err) { return callback(err); }
result.siblings = res;
callback();
});
});
Create tables
Each Model
is capable of creating the Cassandra tables associated with its schema
.
To ensure that a table is created you can pass the ensureTables
option:
var Spice = datastar.define('spice', {
ensureTables: true,
schema:
})
Or call Model.ensureTables
whenever is appropriate for your application:
Spice.ensureTables(function (err) {
if (err) return;
console.log('Spice tables created successfully.');
});
You can also specify an with.orderBy
option to enable CLUSTER ORDER BY
on a
partition key. This is useful if you want your Spice
table to store the newest
items on disk first, making it faster to reading in that order.
Spice.ensureTables({
with: {
orderBy: { key: 'createdAt', order: 'DESC' }
}
}, function (err) {
if (err) return;
console.log('Spice tables created ordered descending');
})
//
// We can also pass an option to enable setting other properties of a table as well!
//
Spice.ensureTables({
with: {
compaction: {
class: 'LeveledCompactionStrategy',
enabled: true,
sstableSizeInMb: 160,
tombstoneCompactionInterval '86400'
},
gcGraceSeconds: 86400
}
}, function (err) {
if (err) return;
console.log('Successfully created and altered the table');
});
Drop Tables
In datastar
, each model also has the ability to drop tables. This assumes the
user used to establish the connection has these permissions. Lets see what this
looks like with our spice
model.
Spice.dropTables(function (err) {
if (err) return;
console.log('Spice tables dropped!');
});
Its as simple as that. We will drop the spice table and any associated Lookup
Tables if they were configured. With .dropTables
and .ensureTables
it's
super easy to use datastar
as a building block for managing all of your Cassandra
tables without executing any manual CQL
.
Statement Building
Currently this happens within the model.js
code and how it interacts with the
statement-builder
and appends statements to the StatementCollection
.
Currently the best place to learn about this is read through the
create/update/remove
pathway and follow how a statement is created and then
executed. In the future we will have comprehensive documentation on this.
Conventions
We make a few assumptions which have manifested as conventions in this library.
-
We do not store null
values in cassandra itself for an update
or create
operation on a model. We store a null
representation for the given
cassandra type. This also means that when we fetch the data back from
cassandra, we return the data back to you with the proper null
s you would
expect. It just may be unintuitive if you look at the cassandra tables
directly and do not see null
values.
This prevents tombstones from being created which has been crucial for our
production uses of cassandrai at GoDaddy. This is something that could be configurable
in the future.
-
Casing, as mentioned briefly in the warning at the top of the readme, we assume
camelCase
as the casing convention when interacting with datastar and the
models created with it. The schema is the only place where the keys used MUST
be written as snake_case
.
Tests
Tests are written with mocha
and code coverage is provided with istanbul
. They can be run with:
# Run all tests with "pretest"
npm test
# Just run tests
npm run coverage
Contributors