scattered-store
Dead simple key-value store for large datasets in Node.js.
Way of storing data
Scattered-store borrows idea for storing data from Git Objects. Let's say we have code:
var store = scatteredStore.create('my_store');
store.set('abc', 'Hello World!');
The code above, when run will store data in file:
/my_store/a9/993e364706816aba3e25717850c26c9cd0d89d
And the algorithm went as follows:
- Key
abc
was hashed with sha1 to: a9993e364706816aba3e25717850c26c9cd0d89d
- The hash was then splitted into two parts:
- First two characters (
a9
) became the name of directory where the entry ended up. - Remaining 38 characters (
993e364706816aba3e25717850c26c9cd0d89d
) became the name of file where data Hello World!
has been stored.
So every entry is stored in separate file, and all files are scattered across maximum of 256 directories (two hex characters) to overcome limit of files per one directory. That's why it's called scattered-store.
Pros
Every entry is stored in separate file what means:
- Implementation is very, very simple. All heavy lifting is done by file system.
- Dataset can safely grow to ridiculous sizes.
Cons
Every entry is stored in separate file what means:
- If the entry is 10 bytes of data, it still occupies whole block on disk.
- Every operation is a separate I/O. Not much room for performance improvements with batch tasks.
Installation
npm install scattered-store
Usage
var scatteredStore = require('scattered-store');
var store = scatteredStore.create('path/to/my/store', function (err) {
if (err) {
} else {
}
});
store.set('abc', 'Hello World!')
.then(function () {
return store.get('abc');
});
.then(function (value) {
console.log(value);
})
Supported key and value types
As key only strings can be used. Value could be everything what can be serialized to JSON and any binary data (passed as Buffer). JSON deserialization also automatically turns ISO notation strings into Date objects.
API
set(key, value)
Stores given value
on given key
. String, Object, Array and Buffer are supported as value
.
Returns: promise
store.set('abc', 'Hello World!')
.then(function () {
});
get(key)
Returns value stored on given key
. If given key
doesn't exist in database null
is returned.
Returns: promise which when resolved returns value
store.get('abc')
.then(function (value) {
console.log(value);
});
getMany(keys)
As keys
accepts array of key
strings, and returns all values for those keys.
Returns: readable stream
var stream = store.getMany(["abc", "xyz"]);
stream.on('readable', function () {
var entry = stream.read();
console.log(entry);
});
stream.on('end', function () {
});
getAll()
Returns all data stored in database through stream (one by one).
Returns: readable stream
var stream = store.getAll();
stream.on('readable', function () {
var entry = stream.read();
console.log(entry);
});
stream.on('end', function () {
});
delete(key)
Deletes entry stored on given key
.
Returns: promise
store.delete('abc')
.then(function () {
});
Performance
npm run benchmark
Here are results of this test on few machines for comparison:
Desktop PC (HDD 7200rpm)
Testing scattered-store performance: 20000 items, 50KB each, 977MB combined.
set... 2522 ops/s
get... 4471 ops/s
getAll... 8428 ops/s
delete... 5605 ops/s
MacBook Pro (SSD)
Testing scattered-store performance: 20000 items, 50KB each, 977MB combined.
set... 1694 ops/s
get... 4018 ops/s
getAll... 6416 ops/s
delete... 4030 ops/s
Mac Mini (HDD 5400rpm)
Testing scattered-store performance: 20000 items, 50KB each, 977MB combined.
set... 726 ops/s
get... 3860 ops/s
getAll... 5071 ops/s
delete... 1130 ops/s