epidemic-broadcast-trees
This module is based on plumtree Epidemic Broadcast Trees paper,
but adapted to also replicate logs, and optimized to achive a minimal
overhead (the cost of the protocol is linear with the number of messages to be sent)
It's a algorithm that combines the robustness of a flooding epidemic gossip broadcast,
with the efficiency of a tree model. It's intended for implementing realtime protocols
(such as chat, scuttlebutt, also radio/video) over networks with random topology -
or networks where otherwise peers may be unable to all connect to each other or to a central hub.
Although the primary motivation for this module is to use it in secure scuttlebutt,
it's intended to be decoupled sufficiently to use for other applications.
example
implement a simple in memory log replicator.
var clocks = {}
var logs = {}
function append (msg, cb) {
var log = logs[msg.author] = logs[msg.author] || []
if(msg.sequence != log.length)
cb(new Error('out of order, found:'+msg.sequence+', expected:'+log.length))
else {
log.push(msg)
ebt.onAppend(msg)
cb()
}
}
var ebt = EBT({
id: 'alice',
getClock: function (id, cb) {
cb(null, clocks[id] || {})
},
setClock: function (id, clock) {
clocks[id] = clock
},
getAt: function (pair, cb) {
if(!logs[pair.id] || !logs[pair.id][pair.sequence])
cb(new Error('not found'))
else
cb(null, logs[pair.id][pair.sequence])
},
append: append
})
ebt.append({
author: 'alice', sequence: 1, content: {}
}, function () {})
ebt.request('alice', true)
ebt.request('bob', true)
var stream = ebt.createStream('bob')
stream.pipe(remote_stream).pipe(stream)
note about push-stream: push-stream is only new, so you'll probably
need to convert this to a pull-stream to connect stream to a network
io stream and serialization
var pushToPull = require('push-stream-to-pull-stream')
var stream = pushToPull(ebt.createStream(remote_id))
pull(stream, remote_pull_stream, stream)
API
EBT(opts) => ebt
where opts provides the necessary things to connect ebt
to your system.
opts = {
id: string,
timeout: 3000, //default,
getClock: function (id, cb),
setClock: function (id, clock),
getAt: function ({id:string, sequence:number}, cb),
append: function (msg, cb)
}
Create a new EBT instance. id
is a unique identifier of the current peer.
In secure-scuttlebutt this is a ed25519 public key.
getClock(id, cb)
and setClock(id, clock)
save a peer's clock object.
This is used to save bandwidth when reconnecting to a peer again.
getAt({id, sequence}, cb)
retrives a message in a feed and an sequence.
messages must have {author, sequence, content}
fields.
append(msg, cb)
append a particular message to the log.
timeout
is used to decide when to switch a feed to another peer.
This is essential to detecting when a peer may have stalled.
ebt.onAppend (msg)
When a message is appended to the database, tell ebt about it.
this must be called whenever a message is successfully appended to the database.
ebt.request(id, follow)
Tell ebt to replicate a particular feed. id
is a feed id, and follow
is a boolean
.
If follow
is false
, but previously was called with true, ebt will stop replicating
that feed.
ebt.progress()
returns an object which represents the current replication progress.
an example object output looks like this, all values are integers >= 0.
{
start: S,
current: C,
total: T
}
this follows a common pattern I've used across ssbc modules for representing progress,
used for example here: https://github.com/ssbc/scuttlebot/blob/master/lib/progress.js
comparison to plumtree
I had an idea for a gossip protocol that avoided retransmitting messages by putting
unneeded connections into standby mode (which can be brought back into service when necessary)
and then was pleasantly surprised to discover it was not a new idea, but had already been described
in a paper - and there is an implementation of that paper in erlang here: https://github.com/helium/plumtree
There are some small differences, mainly because I want to send messages in order, which makes
it easy to represent what messages have not been seen using just a incrementing sequence number per feed.
But plumbtree is solely a broadcast protocol, not an eventually consistent replication protocol.
Since we are replicating logs it's also necessary to send a handshake to request the feeds
from the right points. If you are replicating thousands of feeds the size of the handshake is
significant, so we introduce an algorithm for "request skipping" that avoids sending unnecessary
requests, and saves a lot of bandwidth compared to just requesting all feeds each connection.
todo
License
MIT