new_protocol (working title)
This is a data replication protocol oriented towards applications where a group of semitrusted peers collaborate
on a shared data structure. For example, a chat room, a wiki article, a multiplayer game. Prehaps a market even.
The simplest is a chat room. In a chat room each message is displayed on it's own.
There is no complicated way to interpret updates and decide which has precidence such as in a wiki.
But the core distributed systems problem is the same - ensuring that every peer receives every message.
A multi party chat room is a good model, because even if you think you want 1:1 chat, these days people
are likely to have more than one device, so that becomes many parties.
core api
msg
{
//timestamp representing author's message creation time.
ts: timestamp,
//ids of messages directly preceding this one.
prev: [ids...],
//height is 1 more than the height of prev messages
height: integer > 0,
//arbitrary data
content: ...
}
branching messages
usually a new message has a single previous message.
msg1 <--- msg2
but, it is possible for two peers to create a new message at approximately the same time,
aka, "concurrently". This means, after one peer created the message, but before it was sent to them.
msg1 <--- msg2_1
^
`----- msg2_2
if a message is created after a pair of messages like this, it has two prev items.
msg1 <--- msg2_1 <---,
^ msg3
`----- msg2_2 <---'
exception: the very first message has no previous message.
init ()
create a new state object
update(state, msg) => true | false | new_state
add msg to state.
returns true
if msg is already contained in state.
returns false
if msg
cannot be added because of missing messages. (see missing
)
returns new_state if msg could be successfully added.
create(state, content, ts) => msg
creates a new valid message, given state
and ts
.
state is needed because the message will contain pointers to the leaves from the current state, and the height.
request (state, prev) => {have: [ids...], need: [ids...]}
construct a request object, from state and a received msg's prev field.
it is possible that the received prev has multiple elements,
but you may not be missing all of them. request
will sort that out.
missing (state, request.have, request.need)
when you receive a message, it should be added to the current state, but sometimes a message cannot be added
because the message as arrived in the wrong order. Probably there was another message before it, that you did not receive yet because you were offline, or the packet was dropped, or maybe it just crossed the other packet in the air.
in that case, a request => {have, need}
should be sent to your peers requesting the missing message(s).
They will use missing(state, have, need) => messages
to calculate the messages to be sent to you.
posets
posets (partially ordered sets) provide concepts to conviently talk about messages in distributed systems.
This section establishes some notation about Posets that will be used to describe the protocol further on.
lowercase variables denote an item (a message). Uppercase variables denote a set.
The following operators evaluate to boolean when applied to individual messages,
and can also be used to filter and combine sets of messages.
equality: a == a any message is equal to itself a = a
causation: a < b when message b was created a was known (or another message that causally after a)
concurrency: c >< d when message c was created, they had no knowledge of d, and vice versa
A <= b means messages from poset A that causally preceded b, and message b itself.
A >= b means messages causally after b, and b.
A >< b means messages concurrent with b from set A.
finally, we have some operators and, or, subtract, that are applied to two sets.
A & B means intersection of A and B
A | B means union of A and B
A - B means subtract set B from A.
concurrent set is the same as messages not causally before,
A >< b == ((A - (A < b | A > b))
message transmission
Normally, messages are broadcast one at a time.
but after being offline, a peer will require messages
that have happened since the last one they received.
If they were up to message b in set A, they need messages following from b,
or messages that were concurrent with b.
that is: A - (A <= b), aka A >< b | A > b
in ./naive.js
is a simple way to calculate these:
take the set _A = A, then iterate over the ancestors of b, and delete them from _A.
_A now contains the updates needed. This requires duplicating the entire set, then deleting (probably most) messages from it.
in practice, a request for missing messages with be
request(have, missing)
where have
is a list of known messages,
missing
is a list of messages that we know we do not have.
(probably because we received another message that mentioned them)
So, we assume that the peer has everything before have.
so whats missing is (x <= missing) - have
by using message height,
as messages are added, keep track of the heads.
if since_heads is equal to our current heads, result is empty set.
else check using hashtable or bloom filter if since_heads is in our message set.
if not, return an error.
if yes, heads are in set, then for each head, if prev is not a since_head, then it is in result.
repeat, recursively.
I have an idea for an efficient way to calculate this, in one pass, using the height field.
since I'm intending this to be a udp protocol, which may have dropped packets,
it's possible to recive a message, without some prev messages.
in that case, a peer requests the missing messages `caused_after (have, missing)`
which requests M < missing - M <= have. that is, enough messages to correctly interpret M.
for every x in A > b, height(x) > height(b)
but there *may* exist items in A >< b that have height(x) > height(b)
if we know the height of the missing fields, we can incrementally expand two sets.
first we check the messages prev to missing. if any are not in Have, we expand Have
by adding every Have->prev until at the same min height as Missing.
then we repeat with Missing->prev