fash: consistent hashing library for node.js
This module provides a consistent hashing library. Notably, this module has the
ability to deterministically generate the same hash ring topology across a
set of distributed hosts. Fash also handles collisions of nodes on the ring
and ensures no two nodes will share the same spot on the ring. Additionally,
fash provides the ability to add, remove, or remap physical nodes on the ring
-- useful when a particular physical node has hit its scaling bottleneck.
Index
Design Summary
Fash consists of a mapping of a set of fixed virtual nodes (vnodes) -- usually
a large number, say 1000000 -- distributed across the hash ring. It is then
possible to map these virtual nodes to a set of physical nodes (pnodes). In
practice, pnodes are usually physical shards or servers in a distributed
system. This gives the flexibility of mutating the hashspace of pnodes and the
number of pnodes by re-mapping the vnode assignments. For a more in-depth
explanation, see the Development Information section.
Examples for Consumers
Examples can be found in the unit tests. Here are a few.
Bootstrapping a New Hash Ring
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
var chash = fash.create({
log: LOG, // optional [bunyan](https://github.com/trentm/node-bunyan) log object.
algorithm: 'sha-256', // Can be any algorithm supported by openssl.
pnodes: ['A', 'B', 'C', 'D', 'E'], // The set of physical nodes to insert into the ring.
vnodes: 1000000 // The virtual nodes to place onto the ring. Once set, this can't be changed for the lifetime of the ring.
backend: fash.BACKEND.IN_MEMORY
});
var node = chash.getNode('someKeyToHash');
console.log('key hashes to pnode', node);
If the config used to bootstrap fash is the same across all clients, then the
ring toplogy will be the same as well. By default, fash will evenly distribute
vnodes across the set of pnodes. If you wish to have a custom mapping of pnodes
to vnodes, see the later section on serialization.
Remapping Vnodes in the Ring
Fash gives you the ability to add and rebalance the vnodes in the ring by using
the remapVnode() function, which returns an optional callback.
You can also remove pnodes from the ring, but you must first rebalance the
ring by reassigning its vnodes to other pnodes via remapVnode(). Then you can
invoke removePnode(), which will return an optional callback.
You can assign an arbitrary number of vnodes to the new pnode -- also -- the
pnode can be a new node, or an existing one. Again, as long as the order of
removes and remaps is consistent across all clients, the ring toplogy will be
consistent as well.
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
var chash = fash.create({
log: LOG,
algorithm: 'sha256',
pnodes: ['A', 'B', 'C', 'D', 'E'],
backend: fash.BACKEND.IN_MEMORY,
vnodes: 100000
});
// get vnodes from A
var aVnodes = chash.getVnodes('A');
aVnodes = aVnodes.slice(aVnodes.length / 2);
// remap some of A's vnodes to B
chash.remapVnode('B', aVnodes, function(ring, pnodes) {
console.log('new ring topology', ring);
console.log('changed pnode->vnode mappings', pnodes);
});
Adding More Pnodes to the Ring
You can add additional pnodes to the ring after fash has been initialized by
invoking remapVnode(), which optionally returns a callback. Note that adding
the callback will cause fash to create a new copy of the ring topology across
each invocation -- do not do this if you have millions of vnodes, as this is
quite slow and may consume all available memory, bringing the operation to a
standstill and locking up other resources.
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
var chash = fash.create({
log: LOG,
algorithm: 'sha256',
pnodes: ['A', 'B', 'C', 'D', 'E'],
backend: fash.BACKEND.IN_MEMORY,
vnodes: 100000
});
// specify the set of virtual nodes to assign to the new physical node.
var vnodes = [0, 1, 2, 3, 4];
// add the physical node 'F' to the ring.
chash.remapVnode('F', vnodes, function(ring, changedNodes) {
console.log('ring topology updated', ring);
console.log('removed mappings', changedNodes);
});
Fash will remove the vnodes from their previously mapped physical nodes, and
map them to the new pnode.
Removing Pnodes from the Ring
You can remove physical nodes from the ring by first remapping the pnode's
vnodes to another pnode, and then removing the pnode.
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
var chash = fash.create({
log: LOG,
algorithm: 'sha256',
pnodes: ['A', 'B', 'C', 'D', 'E'],
backend: fash.BACKEND.IN_MEMORY,
vnodes: 10000
});
// get the vnodes that map to B
var vnodes = chash.getVnodes('B');
// rebalance them to A
chash.remapVnode('A', vnodes, function(ring, removedMap) {
// remove B
chash.removePnode('B', function(ring, pnode) {
if (!err) {
console.log('removed pnode %s', pnode);
}
});
});
Adding Optional Data to a Virtual Node
Sometimes, it might be helpful to associate state with a set of vnodes. An
example of this would be during routine system maintenance, an administrator
may want to set a certain set of vnodes to read only, and then set them back to
write mode after the maintenance has completed.
Fash gives enables you to add arbitrary objects to vnodes by invoking the
addData
function.
chash.addData(10, 'foo');
Subsequence chash.getNode()
invocations which map to vnode 10 will return:
{
pnode: 'A',
vnode: 10,
data: 'foo'
}
The data associated with a virtual node is persistent across serializations and
remaps.
Serializing and Persisting the Ring Toplogy
At any time, the ring toplogy can be accessed by:
chash.serialize();
Which returns the ring topology, which is a JSON serialized object string which
looks like
{
pnodeToVnodeMap: {
A: {0, 1, 2},
...
}, // the pnode to vnode mapings.
vnode // the total number of vnodes in the ring.
}
Additionally, anytime remapVnode() is invoked, it can return a cb which
contains an updated version of this object that is not JSON serialized.
Fash can be instantiated given a topology object instead of a list of nodes.
This also allows you to specify a custom pnode to vnode topology -- as
mentioned in the earlier bootstrapping section.
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
var chash = fash.create({
log: LOG,
algorithm: 'sha256',
pnodes: ['A', 'B', 'C', 'D', 'E'],
backend: fash.BACKEND.IN_MEMORY,
vnodes: 10000
});
var topology = chash.serialize();
var chash2 = fash.deserialize({
log: LOG,
topology: topology
});
That's it, chash and chash2 now contain the same ring toplogy.
Development Information
The following information is intended to serve developers making changes or
enhancements to this repository. The command-line interface provides insight
into the structure of the hash ring topology at a deeper level than is made
available by the functions exposed for consumers of the node program. Though
help
can be a good reference once the data structure is understood, this
documentation seeks to give an overview of this structure more comprehensively
than makes sense to include in the tool itself.
The "data structure" is, at a high level, a mapping of pnodes or "physical
nodes" to vnodes or "virtual nodes." It is intended to provide database-level
associations for an operator's (your) key-value store. The advantage of this
setup is the flexibility to move vnodes between pnodes, while the association
between the key-value pairings (where the "key" is hashed to an integer value
and the "value" is a piece of plaintext data) and their assigned vnode is
preserved. This gives users the ability to retain relative groupings of keys
(such as entries in the same file directory) that you've stored even in the
event that you need to move data around. For example, if the data set grows
large enough that additional pnodes are necessary to contain it all, this would
necessitate re-assignment of key-value pairings as some are moved to the new
pnode. The level of abstraction the vnodes enable makes it possible to keep
different key-value pairings associated with one another in a granular and
simple way.
Command Line Ring Creation
Here is the creation of a LevelDB-backed (meaning the key-value store is
represented by the specs of LevelDB -- more information on that can be found here.
This is a hash ring for distributing 6 vnodes throughout 2 pnodes using the
default hashing algorithm, sha256, which will be stored at the location /var/tmp/rings/readme_sample
.
$ ./bin/fash.js create -l /var/tmp/rings/readme_sample -v 6 -a sha256 -p "tcp://1.moray.emy-11.joyent.us:2020, tcp://2.moray.emy-11.joyent.us:2020" -b leveldb
Here is a diagram of what that ring would then look like:
PNODES VNODES
+-----------+
+-------------------------------------------+ +--------+ | "0": 1 |
| | +-----------+
| |
| "tcp://1.moray.emy-11.joyent.us:2020" | +-----------+
| | +--------+ | "2": 1 |
| | +-----------+
| |
+-------------------------------------------+ +-----------+
+--------+ | "4": 1 |
+-----------+
+-----------+
+-------------------------------------------+ +--------+ | "1": 1 |
| | +-----------+
| |
| "tcp://2.moray.emy-11.joyent.us:2020" | +-----------+
| | +--------+ | "3": 1 |
| | +-----------+
| |
+-------------------------------------------+ +-----------+
+--------+ | "5": 1 |
+-----------+
There is another way to create a ring topology if you already have a JSON file
representation of one. If you want to manufacture one, you can use the print-hash
command to print a JSON representation of a ring after creating it from scratch
(as above). Copying that into a .json
file will give you the raw materials
for testing out the deserialize-ring
method of ring creation
below.
This is what the .json
file should look like:
$ cat readme_sample.json
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":1},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
And then to actually generate and store that LevelDB-backed ring:
$ ./bin/fash.js deserialize-ring -l /var/tmp/rings/readme_sample_deserialized -f /var/tmp/rings/readme_sample.json
To see that the deserialized ring is the same as the original, the new ring can
be viewed with the same print-hash
command used to show the original.
$ ./bin/fash.js print-hash -l /var/tmp/rings/readme_sample_deserialized -b leveldb
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":1},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
The fash tool can also do a direct comparison on its own with diff
:
$ ./bin/fash.js diff -b leveldb /var/tmp/rings/readme_sample /var/tmp/rings/readme_sample_deserialized
{"tcp://1.moray.emy-11.joyent.us:2020":{"removed":[0],"added":[0]},"tcp://2.moray.emy-11.joyent.us:2020":{"removed":[1],"added":[1]}}
Backends: an interlude
Please note that for commands added before version 2.4.0 (of which create
is
one), -b
option has to be specified. This command's purpose is to indicate
whether the topology is backed by a LevelDB or in-memory data store.
As of version 2, fash supported a LevelDB backend in addition to its original
in-memory backend. The LevelDB backend has several advantages over the
in-memory backend. Notably, performance at scale should be faster since the
ring is no longer in v8. Also the ring can be persisted on disk via LevelDB,
removing the need to load the ring into memory when the process is restarted.
For these reasons, at the time of writing, the in-memory option for
creating hash rings is considered deprecated, and may be removed in the
future, once any existing dependencies on it can be verified and mitigated.
That is why this section focuses on working with the LevelDB-backed hash ring
functionality, though it would be remiss not to explain why some commands will
not work without passing in -b leveldb
.
The way to create a ring -- specifying a backend -- from the command line is
demonstrated above. Choosing a backend as a
consumer is done during fash initialization via the backend
field.
var fash = require('fash');
var Logger = require('bunyan');
var LOG = new Logger({
name: 'fash',
level: 'info'
});
fash.create({
log: LOG,
algorithm: 'sha-256',
pnodes: ['A', 'B', 'C', 'D', 'E'],
vnodes: 1000000
backend: fash.BACKEND.LEVEL_DB,
location: '/tmp/chash'
}, function(err, chash) {
console.log('chash created');
});
Walkthrough: working with the hash ring
Getting back to the example topology created above; in order to
understand how this is related to both the LevelDB key-value store and also
the pnode-vnode mapping, we will take just one "vnode" and break down the
information inside it.
+-----------+
| "4": 1 |
+-----------+
The 4
is the vnode number. It is also the post-hash key in the sense of the
key-value LevelDB store. What this mean is that if you have the key
/yunong/yunong.txt
, and then hash it using one of the available algorithms --
say, sha256 -- you will get back, in this ring topology, the value 4
. So
vnode 4
is where any data (value) you wanted to associate with the /yunong/yunong.txt
key will be stored. In this case, we are storing the default value of 1 in
that vnode, associated with the key /yunong/yunong.txt
. To help illustrate
this point, there is a get-node
command to show operators what any key is
associated with in terms of its data, pnode, and vnode mapping.
$ ./bin/fash.js get-node "/yunong/yunong.txt" -l /var/tmp/rings/readme_sample -b leveldb
{ pnode: 'tcp://1.moray.emy-11.joyent.us:2020',
vnode: 4,
data: 1 }
This operation is idempotent as long as the topology does not change. So /yunong/yunong.txt
will always map to vnode 4
UNLESS the number of pnodes or vnodes changes.
For this reason, remapping vnodes must always be a manual process -- we
cannot recreate the ring with more pnodes or vnodes and expect that /yunong/yunong.txt
will still map to vnode 4
on the tcp://1.moray.emy-11.joyent.us:2020
pnode.
Adding Data to Vnodes
So in an example where we are approaching full capacity on our two pnodes and
need to add a third, it might make sense to mark the vnodes we're going to move
in some way. Then our application can check for that demarcation and, say, not
accept writes that might be lost during a move. If the chosen identifer was ro
(for READ ONLY), we could assign the value ro
to the vnodes with the keys
we're shifting over. This is done with the add-data
command:
$ ./bin/fash.js add-data -v 4 -d 'ro' -l /var/tmp/rings/readme_sample -b leveldb -o
{"4":"ro"}
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":"ro"},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
To achieve this:
+-----------+
| "4": "ro" |
+-----------+
So now vnode 4
, to which the key /yunong/yunong.txt
maps, contains the
value ro
.
Remapping Vnodes
Now that the vnode we want to move can be identified, we will want to actually
move it to the new pnode. To see a full list of vnodes you have marked use
the command get-data-vnodes
.
$ ./bin/fash.js get-data-vnodes -l /var/tmp/rings/readme_sample
vnodeArray: [ 4 ]
If this array were much larger, say, hundreds or thousands of vnodes, but you
were still just interested in knowing what data vnode 4
contained (or if at
any point you needed to know which pnode it was assigned to) the get-vnode-pnode-and-data
command displays all available information about a given vnode.
$ ./bin/fash.js get-vnode-pnode-and-data -v 4 -l /var/tmp/rings/readme_sample
vnodes: { '4': { pnode: 'tcp://1.moray.emy-11.joyent.us:2020', vnodeData: 1 } }
In our example, from this you could conclude that vnode 4 was marked as
read-only for relocation, so it is intended to be remapped to a new pnode tcp://3.moray.emy-11.joyent.us:2020
with remap-vnode
:
$ ./bin/fash.js remap-vnode -v 4 -b leveldb -p "tcp://3.moray.emy-11.joyent.us:2020" -l /var/tmp/rings/readme_sample -o
{"tcp://3.moray.emy-11.joyent.us:2020":"4"}
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1},"tcp://3.moray.emy-11.joyent.us:2020":{"4":"ro"}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
So that the topology now looks like this:
PNODES VNODES
+-----------+
+-------------------------------------------+ +--------+ | "0": 1 |
| | +-----------+
| |
| "tcp://1.moray.emy-11.joyent.us:2020" | +-----------+
| | +--------+ | "2": 1 |
| | +-----------+
| |
+-------------------------------------------+
+-----------+
+-------------------------------------------+ +--------+ | "1": 1 |
| | +-----------+
| |
| "tcp://2.moray.emy-11.joyent.us:2020" | +-----------+
| | +--------+ | "3": 1 |
| | +-----------+
| |
+-------------------------------------------+ +-----------+
+--------+ | "5": 1 |
+-----------+
+-------------------------------------------+
| |
| |
| "tcp://3.moray.emy-11.joyent.us:2020" | +-----------+
| | +--------+ | "4": "ro" |
| | +-----------+
| |
+-------------------------------------------+
This command will also take a comma-separated list of vnodes (rather than just
a single one) to move a group of vnodes to the same pnode in one batch. It is
also possible to remap vnodes to an existing pnode, rather than a new one.
Removing Pnodes
If you've decided that you no longer need a particular pnode, you can remove
it. There is one caveat, which is that you have to remap the vnodes it
contains first. A list of vnodes on any pnode is available with get-vnodes
.
$ ./bin/fash.js get-vnodes 'tcp://3.moray.emy-11.joyent.us:2020' -l /var/tmp/rings/readme_sample -b leveldb
[ 4 ]
So now we would remap vnode 4
, this time to an existing pnode:
$ ./bin/fash.js remap-vnode -v 4 -b leveldb -p "tcp://1.moray.emy-11.joyent.us:2020" -l /var/tmp/rings/readme_sample -o
{"tcp://1.moray.emy-11.joyent.us:2020":"4"}
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":"ro"},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
Although it appears from that output that the vnode
"tcp://3.moray.emy-11.joyent.us:2020" is already gone, that is not true. This
is because the LevelDB keys the print-hash command uses to
display the LevelDB keys does not use the /PNODE
key, which keeps track of
all the pnodes in the ring. This behavior is also present in the -o flag available on some commands. In any case, we can see the full list of the pnodes on the ring using get-pnodes
.
$ ./bin/fash.js get-pnodes -l /var/tmp/rings/readme_sample -b leveldb
[ 'tcp://1.moray.emy-11.joyent.us:2020',
'tcp://2.moray.emy-11.joyent.us:2020',
'tcp://3.moray.emy-11.joyent.us:2020' ]
So now we can remove the pnode from LevelDB:
$ ./bin/fash.js remove-pnode -p "tcp://3.moray.emy-11.joyent.us:2020" -b leveldb -l /var/tmp/rings/readme_sample -o
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":"ro"},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
And check that it's really gone:
$ ./bin/fash.js get-pnodes -l /var/tmp/rings/readme_sample -b leveldb
[ 'tcp://1.moray.emy-11.joyent.us:2020',
'tcp://2.moray.emy-11.joyent.us:2020' ]
LevelDB Schema
The key-value store that we use is called LevelDB. For more information on how
LevelDB works generally, see https://github.com/google/leveldb. For
information on the LevelDOWN library that serves as a Node.js binding for
LevelDB, see https://github.com/level/leveldown/.
In fash's setup, we create the following keys to manipulate and describe the
structure (LevelDB key-value store) that underlies a ring topology (pnode-vnode
mapping). In the leveldb.js
file, each of these are assigned to a constant
variable prefaced with LKEY_
with P
and V
standing in for %s
and %d
.
The fact that the key names resemble file paths does not imply that a navigable
file system backs this structure. (For example, there is no key that
corresponds to /VNODE
).
KEY VALUE
VNODE_COUNT | The total number of vnodes in the ring
VNODE_DATA | The array of vnodes that contain optional data
/VNODE/%d | The pnode a given vnode (%d) is assigned to
/PNODE | The array of all pnodes in the hash ring
/PNODE/%s | The array of all vnodes that are assigned to the given pnode (%s)
/PNODE/%s/%d | The value of the data on the given vnode (%d), which is on the given pnode (%s)
ALGORITHM | The hashing algorithm used to create the ring topology
VERSION | Supposedly the version of fash we are on -- at this time, hard-coded to 2.1.0
COMPLETE | Essentially a flag that is set to 1 when the ring creation finishes
LevelDB-supported CLI
The purpose of these descriptions is to identify and explain each command's
relationship to the LevelDB keys it needs to function. Many of these are
explained in the context of their use in the walkthrough
section. This section will describe them in terms of their relationship to
LevelDB.
create
Click here to see an example
create
opens the LevelDB database and populates all of the keys in the
schema. A step-by-step listing of the order in which the keys are populated is
below:
- First, the count of vnodes is stored in
VNODE_COUNT
. - In batches of 1000, each
/VNODE/%d
key is written to its
correspondent pnode. - Each
/PNODE/%s/%d
key is assigned the LVAL_NULL
default
data value of 1
. - The
/PNODE/%s
keys are written from the pnodeToVnodeMap
created during the previous step, and the /PNODE
array is
populated from the keys of the pnodeMap
object created from the pnode options
passed in this command. - (and 6 and 7) The
ALGORITHM
, VERSION
,
and COMPLETE
keys are written -- the latter two hard-coded
to default values. The algorithm is taken from the options passed, and, if none
are found, defaults to sha256.
The effect of this is to create a hash ring with a one-to-many relationship
between pnodes and vnodes, and a one-to-one relationship between vnodes and the
data they hold. The LevelDB keys end up serving as a kind of
metadata for describing aspects of this structure.
deserialize-ring
Click here to see an example
deserialize-ring
takes a json representation of a topology from a file or
stdin and turns it into a LevelDB data structure. In this case, only the
LevelDB keys used for accessing the underlying mappings will be written. Since
the hashing has already necessarily completed in order to create the input
given, ALGORITHM
, VERSION
, and COMPLETE
do not need to be written. This command writes /VNODE/%d
, /PNODE/%s/%d
,
/PNODE/%s
, and VNODE_DATA
.
add-data
Click here to see an example
add-data
takes a vnode number and allows you to assign any data to it. If
there is any use-case for which the consuming application needs to
differentiate between arbitrary vnodes, assigning an identifiable data string
to it is the way to do that. The way it works is that the value of the key /PNODE/%s/%d
has to be overwritten to contain the data you've passed, and the vnode chosen
to put the data inside has to be added to the array stored in VNODE_DATA
.
To access /PNODE/%s/%d
, you need the pnode name. The function
does a lookup of the /VNODE/%d
key to find the pnode that
the vnode you passed in is assigned to, and then assigns the data to that key.
For VNODE_DATA
, the function does a lookup of the current
value of the key (an array vnodeArray
), and if the chosen vnode isn't in that
array, and the data is not null, it is pushed into vnodeArray
, which is
reassigned to VNODE_DATA
.
remap-vnode
Click here to see an example
remap-vnode
takes a list of vnodes (which can be just one vnode) and a
target pnode, and moves those vnodes from their current location onto the
target pnode. This is a multi-step process that involves multiple LevelDB
keys.
- First there is an attempt to look up the vnodes under the target pnode using
the
/PNODE/%s
key, not to retrieve the
list of vnodes, but to see if a NotFoundError
is generated. By
recording this if/else in a boolean variable, this gives the rest of the
function access to whether or not the pnode is a new or an existing one. - Next there is a lookup on the
/VNODE/%d
key to retrieve the current pnode of the vnode(s) set to be remapped. If the
current and intended pnode are the same, an error is thrown and the operation
errors out. - Then there is a lookup of the
/PNODE/%s/%d
key to get the data on the vnode. - Next the keys to assign data to the moving vnodes --
/PNODE/%s/%d
-- are deleted. The vnodes for the old pnode are retrieved from the /PNODE/%s
key, then put into a batch for later. Then, the /PNODE/%s/%d
keys are deleted. - The new vnodes-from-pnode mappings key (
/PNODE/%s
),
data keys (/PNODE/%s/%d
), and
pnode-from-vnode (/VNODE/%d
) are put
into batches to be committed as LevelDB keys in the final commit step. - The new pnode is written to the
/PNODE
key.
remove-pnode
Click here to see an example
remove-pnode
checks that the /PNODE/%s
key exists, and if it has any vnodes. If it has vnodes, it will ask the
operator to re-assign those first. If not, it will delete the the /PNODE/%s
key and alter the the /PNODE
key's array so that it no longer
contains the pnode in question.
get-pnodes
Click here to see an example
get-pnodes
lists the array of pnodes in the hash ring. It does this via a get
on the /PNODE
key in LevelDB.
get-vnodes
Click here to see an example
get-vnodes
lists the array of vnodes in a given pnode in the hash ring. It
does this via a get on the /PNODE/%s
key in LevelDB, where %s
is the argument you pass to the -p
flag when invoking this command.
get-vnode-pnode-and-data
Click here to see an example
get-vnode-pnode-and-data
takes a (number of) vnode(s), and from that returns
the values of the pnode it is assigned to via the /VNODE/%d
key and also the data it contains via the /PNODE/%s/%d
key.
get-data-vnodes
Click here to see an example
get-data-vnodes
returns the value of the VNODE_DATA
key.
This will return any vnode that has a value other than the default value of 1 --
it cannot, at the time of writing, select on different data values held in
vnodes. However, it offers a verbose flag so that an operator can view the data
held within each vnode, as well as which pnode it is assigned to. The keys used
to accomplish this are those used in get-vnode-pnode-and-data
.
get-node
Click here to see an example
get-node
takes a key, runs it through fash's deterministic hashing algorithm
to determine the vnode it is on, and does a get on the /VNODE/%d
key in LevelDB, which returns the pnode information needed to do a get on the /PNODE/%s/%d
key in LevelDB, which gives back the data on the vnode and pnode derived from
the key.
print-hash
Click here to see an example
print-hash
lets you view the hash ring topology. This operation is also
invoked by passing the -o flag to the commands
create, add-data, remap-vnode, or remove-pnode
-- but do so WITH EXTREME CAUTION. For large hash rings, such as those with a
million or more vnodes, this operation will be extremely slow, or may not even
complete.
./bin/fash.js print-hash -l /var/tmp/rings/readme_sample -b leveldb
{"vnodes":6,"pnodeToVnodeMap":{"tcp://1.moray.emy-11.joyent.us:2020":{"0":1,"2":1,"4":"ro"},"tcp://2.moray.emy-11.joyent.us:2020":{"1":1,"3":1,"5":1}},"algorithm":{"NAME":"sha256","MAX":"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF","VNODE_HASH_INTERVAL":"2aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"},"version":"2.1.0"}
diff
Click here to see an example
diff
shows certain differences between two hash ring topologies. The first
topology you pass will be compared to the second -- bear this in mind for the
differences returned will be in a format that uses the words "added" and
"removed." In particular, the names of the pnodes in the ring, as well as the
number and allocations of vnodes, will be compared. Data within each vnode
will be ignored. A sample return is below:
$ ./bin/fash.js diff -b leveldb /var/tmp/rings/sample1 /var/tmp/rings/sample2
{"tcp://1.moray.emy-11.joyent.us:2020":{"added":[20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99]}}
In terms of schema keys, the function gets the values of /PNODE
from the two topologies passed to the command, then looks up the values of /PNODE/%s
to get the complete set of data it compares.
Running Tests
Just run make prepush
or make test
, first exporting the DB_LOCATION
variable for the load test file to give it the location of your hash ring.
Copyright (c) 2017, Joyent, Inc.
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.