new_protocol (working title)
SUMMARY
This is a, eventually consistent data replication protocol, with single-writer message blocks that are causally ordered and cryptographically linked. It is intended for peer-to-peer applications where peers are semi-trusted.
USE CASES
This protocol sits between network data and application data. Use cases include, chat rooms, multiplayer games, file sharing, or market places. The following table tries to provide a visual reference.
Stack | Description | Characteristics |
---|
CRDT, OT, etc. | Application Data | Merge Strategies, Presentation |
new_protocol | Causal Messages | Partially Ordered, Reliable, Partition Tolerant |
introducer | Framing | Framing, Routing, Encodinng |
UDP, TCP, etc. | Network Packets | Raw Data |
SPEC
Message branching
Messages are displayed from left to right to illustrate their causal ordering. Usually a new message has a single previous message.
msg1 <- msg2
However, it is possible for two peers to create a new message at approximately the same time (aka "concurrently"). This means, after one peer created the message, but before it was sent to them. In this diagram, msg2_1
and msg2_2
are considered heads
.
msg1 <-- msg2_1
^
`--- msg2_2
If a message is created after a pair of messages like this, it has two prev items. Although the exception to this rule is where the very first message has no previous message. In this diagram, msg3
is considered the head
of this data.
msg1 <-- msg2_1 <--,
^ msg3
`--- msg2_2 <--'
Posets
Posets partially ordered sets provide concepts to conviently talk about messages in distributed systems. This section establishes some notation about Posets that will be used to describe the protocol further on. Lowercase variables denote an item (a message). Uppercase variables denote a set.
A set may have one or more head
messages. A head
message indicates that it is the latest in the chain of messages. A set may have more than one head
if the latest messages in the chain are concurrent.
The following operators evaluate to boolean when applied to individual messages, and can also be used to filter and combine sets of messages.
-
a == a
means equality, a message is equal to itself a = a
-
a < b
means causation, a message b
was created a was known (or another message that causally after a
)
-
c >< d
means concurrency, a message c
was created, they had no knowledge of d
, and vice versa
-
A <= b
means messages from poset A
that causally preceded b
, and message b
itself.
-
A >= b
means messages causally after b
, and b
.
-
A >< b
means messages concurrent with b
from set A
.
The operators and
, or
, and subtract
are applied to two sets.
A & B
means the intersection of A
and B
.A | B
means union of A
and B
.A - B
means subtract set B
from A
.
The concurrent set is the same as messages not causally before,
A >< b == ((A - (A < b | A > b))
Message Transmission
Normally, messages are broadcast one at a time. But after being offline, a peer will require messages that have happened since the last one they received. If they had message b
in set A
, they need messages that come after b
or were concurrent with b
. As posets, this can be expressed as A - (A <= b)
, or it can also be understood as A >< b | A > b
.
The reference implementation in ./naive.js
demonstrates a simple way to calculate these:
Let the variable _A
be a copy of the set A
. Iterate over the ancestors of b
recursively, and delete them from _A
. _A
now contains the updates needed. This implementation is considered naive because this requires duplicating the entire set, then deleting (probably most) messages from it.
In practice, a request for missing messages will include a list of most . request(have, missing)
where have
is a list of known messages, missing
is a list of messages that we know we do not have (probably because we received another message that mentioned them).
So, we assume that the peer has everything before have. so whats missing is
(x <= missing) - have
by using message height,
API
msg
- the primary data strucure of a message consists of the following fields.
{
ts: timestamp,
prev: [ids...],
height: integer > 0,
content: ...
}
state
- an object that represents all known messages
{
messages: [...msg],
heads: [],
height: 0,
waiting: []
}
init ()
Create a new state object
@returns {object} state - an object that represents the state of all known messages.
update (state, msg) => true | false | new_state
@param {object} state - the state object
@param {object} msg - a message object
Integrate a msg to with the current state.
returns true
if msg is already contained in state.
returns false
if msg
cannot be added because of missing messages. (see missing
)
returns new_state if msg could be successfully added.
create (state, content, ts) => msg
@param {object} state - the state object
@param {object} content - an arbitrary value of any type and of a length that will be accepted by the underlying framing protocol.
@param {string} ts - a unix timestamp
Creates a new valid message that can be updated into the state. State is needed because the message will contain pointers to the leaves from the current state, and the height.
request (state, prev) => { have: [ids...], need: [ids...] }
@param {object} state - the state object
@param {string} prev - a message id
Construct a request object, from state and a received msg's prev field. It is possible that the received prev has multiple elements, but you may not be missing all of them. request
will sort that out.
missing (state, { have, need })
@param {object} state - the state object
@param {object} delta - an object that describes what you already have and what you need, as returned by the request method.
Check if a message is missing by walking the chain of previous links.
When you receive a message, it should be added to the current state, but sometimes a message cannot be added because the message as arrived in the wrong order. Probably there was another message before it, that you did not receive yet because you were offline, or the packet was dropped, or maybe it just crossed the other packet in the air.
In that case, a request => {have, need}
should be sent to your peers requesting the missing message(s). They will use missing(state, have, need) => messages
to calculate the messages to be sent to you.