DOT
The DOT project is a blend of operational
transformation,
CmRDT,
persistent/immutable
datastructures
and reactive
stream processing.
This is an implementation of distributed data synchronization of rich
custom data structures with conflict-free merging.
Status
This is very close to v1 release. The ES6 version
interoperates well right now but outstanding short-term issues have
more to do with consistency of the API surface than features:
The ES6 version has a simpler polling-based Network API that seems worth adopting here. ** Adopted **The ES6 branch/undo integration also feels a lot simpler. ** Adopted **- The ES6 version prefers
replace()
instead of update()
. - Nullable value types (i.e typed Nil values vs change.Nil vs nil) seems confusing.
Features
- Small, well tested mutations and immutable persistent values
- Support for rich user-defined types, not just collaborative text
- Streams and Git-like branching, merging support
- Simple network support (Gob serialization) and storage support
- Strong references support that are automatically updated with changes
- Rich builtin undo support for any type and mutation
- Folding (committed changes on top of uncommitted changes)
- Support for CmRDT types (see crdt)
An interoperable ES6 version is available on dotchain/dotjs with a TODO MVC demo of it here
Contents
- Status
- Features
- CRDTs
- TODO Example
- Server
- Types
- Type registration
- Code generation
- Toggling Complete
- Changing description
- Adding Todos
- Client connection
- Running the demo
- In browser demo
- How it all works
- Applying changes
- Applying changes with streams
- Composition of changes
- Convergence
- Convergence using streams
- Revert and undo
- Folding
- Branching of streams
- References
- Network synchronization and server
- Broad Issues
- Contributing
CRDTs
Much of the framework can support operation-based CRDT changes which
simply appear as commutative operations (and so the merge operation is
trivial). A set of types built this way is available in the
crdt
folder.
TODO Example
The standard TODO-MVC example demonstrates the features of
collaborative (eventually consistent) distributed data structures.
Server
The DOT backend is essentially a simple log store. All mutations to
the application state are represented as a sequence of operations
and written in append-only fashion onto the log. The following
snippet shows how to start a web server (though it does not include
authentication or CORs for example).
func Server() {
http.Handle("/dot/", dot.BoltServer("file.bolt"))
http.ListenAndServe(":8080", nil)
}
The example above uses the
Bolt
backend for the actual storage of the operations. There is also a
Postgres backend
available.
Note that the server above has no real reference to any application
logic: it simply accepts operations and writes them out in a
guaranteed order broadcasting these to all the clients.
Types
A TODO MVC app consists of only two core types: Todo
and TodoList
:
type Todo struct {
Complete bool
Description string
}
type TodoList []Todo
Type registration
To use the types across the network, they have to be registered with
the codec (which will be
sjson
in this example)
func init() {
nw.Register(Todo{})
nw.Register(TodoList{})
}
Code generation
For use with DOT, these types need to be augmented with standard
methods of the Value
interface (or in the case of lists like TodoList
, also implement the
Collection
interface).
These interfaces are essentially the ability to take changes of the
form replace a sub field or replace items in the array and
calculate the result of applying them. They are mostly boilerplate
and so can be autogenerated easily via the
dotc package. See
code generation for augmenting the above type
information.
The code generation not only implements these two interfaces, it also
produces a new Stream type for Todo and TodoList. A
stream type is like a linked list with the Value
field being the
underlying value and Next() returning the next entry in the stream
(in case the value was modified). And Latest returns the
last entry in the stream at that point. Also, each stream type
implements mutation methods to easily modify the value associated with
a stream.
What makes the streams interesting is that two different modifications
from the same state cause the Latest of both to be the same with
the effect of both merged. (This is done using the magic of
operational transformations)
Toggling Complete
The code to toggle the Complete
field of a particular todo item
looks like the following:
func Toggle(t *TodoListStream, index int) {
todoStream := t.Item(index)
completeStream := todoStream.Complete()
completeStream.Update(!completeStream.Value)
}
Note that the function does not return any value here but the updates
can be fetched by calling .Latest()
on any of the corresponding
streams. If a single stream instance has multiple edits, the
Latest()
value is the merged value of all those edits.
Changing description
The code for changing the Description
field is similar. The string
Description
field in Todo
maps to a streams.S16
stream. This
implements an Update()
method like all streams.
But to make things interesting, lets look at splicing rather
than replacing the whole string. Splicing is taking a subsequence of
the string at a particular position and replacing it with the provided
value. It captures insert, delete and replace in one operation.
This probably better mimics what text editors do and a benefit of such
high granularity edits is that when two users edit the same text, the
edits will merge quite cleanly so
long as they don't directly touch the same characters.
func SpliceDescription(t *TodoListStream, index, offset, count int, replacement string) {
todoStream := t.Item(index)
descStream := todoStream.Description()
descStream.Splice(offset, count, replacement)
}
Adding Todos
Adding a Todo is relatively simple as well:
func AddTodo(t *TodoListStream, todo Todo) {
t.Splice(len(t.Value), 0, todo)
}
The use of Splice
in this example should hint that (just like
strings) collections support insertion/deletion at arbitrary points within
via the Splice method. In addition to supporting this, collections also
support the Move(offset, count, distance)
method to move some items
around within the collection
Client connection
Setting up the client requires connecting to the URL where the server
is hosted. In addition, the code below illustrates how sessions
could be saved and restarted if needed.
var Lock sync.Mutex
func Client(stop chan struct{}, render func(*TodoListStream)) {
url := "http://localhost:8080/dot/"
session, todos := SavedSession()
s, store := session.NonBlockingStream(url, nil)
defer store.Close()
todosStream := &TodoListStream{Stream: s, Value: todos}
ticker := time.NewTicker(500*time.Millisecond)
changed := true
for {
if changed {
render(todosStream)
}
select {
case <- stop:
return
case <- ticker.C:
}
Lock.Lock()
s.Push()
s.Pull()
next := todosStream.Latest()
changed = next != todosStream
todosStream, s = next, next.Stream
Lock.Unlock()
}
SaveSession(session, todosStream.Value)
}
func SaveSession(s *dot.Session, todos TodoList) {
}
func SavedSession() (s *dot.Session, todos TodoList) {
return dot.NewSession(), nil
}
Running the demo
The TODO MVC demo is in the
example
folder.
The snippets in this markdown file can be used to generate the
todo.go file and then auto-generate the "generated.go" file:
$ go get github.com/tvastar/test/cmd/testmd
$ testmd -pkg example -o examples/todo.go README.md
$ testmd -pkg main codegen.md > examples/generated.go
The server can then be started by:
$ go run server.go
The client can then be started by:
$ go run client.go
The provide client.go stub file simply appends a task every 10
seconds.
In browser demo
The fuss project has demos of a
TODO-MVC app built on top of this framework using
gopherjs. In particular, the
collab
folder illustrates how simple the code is to make something work
collaboratively (the rest of the code base is not even aware of
whether things are collaborative).
How it all works
There are values, changes and streams.
- Values implement the
Value
interface. If the value represents a collection, it also implements
the
Collection
interface.
- Changes represent mutations to values that can be merged. If
two independent changes are made to the same value, they can be merged
so that the
A + merged(B) = B + merged(A)
. This is represented by
the Change
interface. The
changes package
implements the core changes with composition that allow richer changes
to be implemented. - Streams represent a sequence of changes to a value, except it
is convergent -- if multiple writers modify a value, they each get
a separate stream instance that only reflects their local change but
following the Next chain will guarantee that all versions end up with
the same final value.
Applying changes
The following example illustrates how to edit a string with values and
changes
initial := types.S8("hello")
append := changes.Splice{
Offset: len("hello"),
Before: types.S8(""),
After: types.S8(" world"),
}
updated := initial.Apply(nil, append)
fmt.Println(updated)
Applying changes with streams
A less verbose stream based version (preferred) would look like so:
initial := &streams.S8{Stream: streams.New(), Value: "hello"}
updated := initial.Splice(5, 0, " world")
fmt.Println(updated.Value)
The changes
package implements the core changes: Splice, Move and
Replace. The logical model for these changes is to treat all
values as either being like arrays or like maps. The actual
underlying datatype can be different as long as the array/map
semantics is implemented.
Composition of changes
Changes can be composed together. A simple form of composition is
just a set of changes:
initial := types.S8("hello")
append1 := changes.Splice{
Offset: len("hello"),
Before: types.S8(""),
After: types.S8(" world"),
}
append2 := changes.Splice{
Offset: len("hello world"),
Before: types.S8(""),
After: types.S8("."),
}
both := changes.ChangeSet{append1, append2}
updated := initial.Apply(nil, both)
fmt.Println(updated)
Another form of composition is modifying a sub-element such as an
array element or a dictionary path:
initial := types.A{types.M{"hello": types.S8("world")}}
replace := changes.Replace{Before: types.S8("world"), After: types.S8("world!")}
path := []interface{}{0, "hello"}
c := changes.PathChange{Path: path, Change: replace}
updated := initial.Apply(nil, c)
fmt.Println(updated)
Convergence
The core property of all changes is the ability to guarantee
convergence when two mutations are attempted on the same state:
initial := types.S8("hello")
insert := changes.Splice{Offset: 5, Before: types.S8(""), After: types.S8(" world")}
remove := changes.Splice{Offset: 3, Before: types.S8("lo"), After: types.S8("")}
inserted := initial.Apply(nil, insert)
removed := initial.Apply(nil, remove)
removex, insertx := insert.Merge(remove)
final1 := inserted.Apply(nil, removex)
final2 := removed.Apply(nil, insertx)
fmt.Println(final1, final1 == final2)
Convergence using streams
The same convergence example is a lot easier to read with streams:
initial := streams.S8{Stream: streams.New(), Value: "hello"}
s1 := initial.Splice(5, 0, " world")
s2 := initial.Splice(3, len("lo"), "")
s1 = s1.Latest()
s2 = s2.Latest()
fmt.Println(s1.Value, s1.Value == s2.Value)
The ability to merge two independent changes done to the same
initial state is the basis for the eventual convergence of the data
structures. The
changes package
has fairly intensive tests to cover the change types defined there,
both individually and in composition.
Revert and undo
All the predefined types of changes in DOT (see
changes) are
carefully designed so that every change can be inverted easily without
reference to the underlying value. For example,
changes.Replace
has both the Before and After fields instead of just keeping
the After. This allows the reverse to be computed quite easily by
swapping the two fields. This does generally incur additional storage
expenses but the tradeoff is that code gets much simpler to work
with.
In particular, it is possible to build generic
undo support
quite easily and naturally. The following example shows both Undo
and Redo being invoked from an undo stack.
master := &streams.S16{Stream: streams.New(), Value: "hello"}
s := undo.New(master.Stream)
undoableChild := &streams.S16{Stream: s, Value: master.Value}
undoableChild = undoableChild.Splice(0, len("h"), "H")
fmt.Println(undoableChild.Value)
master.Splice(len("hello"), 0, "$")
s.Undo()
undoableChild = undoableChild.Latest()
fmt.Println(undoableChild.Value)
s.Redo()
undoableChild = undoableChild.Latest()
fmt.Println(undoableChild.Value)
Folding
In the case of editors, folding refers to a piece of text that has
been hidden away. The difficulty with implementing this in a
collaborative setting is that as external edits come in, the fold has
to be maintained.
The design of DOT allows for an elegant way to achieve this: consider
the "folding" as a local change (replacing the folded region with
say "..."). This local change is never meant to be sent out. All
changes to the unfolded and folded versions can be proxied quite
nicely without much app involvement:
master := &streams.S16{Stream: streams.New(), Value: "hello world!"}
foldChange := changes.Splice{
Offset: len("hello"),
Before: types.S16(" world"),
After: types.S16("..."),
}
foldedStream := fold.New(foldChange, master.Stream)
folded := &streams.S16{Stream: foldedStream, Value :"hello...!"}
folded = folded.Splice(0, len("h"), "H")
folded = folded.Splice(len("Hello...!"), 0, "!!")
fmt.Println(folded.Value)
master = master.Splice(len("h"), len("e"), "u")
fmt.Println(master.Value)
fmt.Println(folded.Latest().Value)
fmt.Println(master.Latest().Value)
Branching of streams
Streams in DOT can also be branched a la Git. Changes made in
branches do not affect the master or vice-versa -- until one of Pull
or Push are called.
master := &streams.S16{Stream: streams.New(), Value: "hello"}
local := &streams.S16{Stream: streams.Branch(master.Stream), Value: master.Value}
local.Splice(len("h"), len("e"), "a")
fmt.Println(master.Latest().Value)
local.Stream.Push()
fmt.Println(master.Latest().Value)
There are other neat benefits to the branching model: it provides a
fine grained control for pulling changes from the network on demand
and suspending it as well as providing a way for making local
changes.
References
There are two broad cases where a JSON-like structure is not quite
enough.
- Editors often need to track the cursor or selection which can be
thought of as offsets in the editor text. When changes happen to the
text, for example, the offset would need to be updated.
- Objects often need to refer to other parts of the JSON-tree. For
example, one can represent a graph using the array, map primitives
with the addition of references. When changes happen, these too would
need to be updated.
The refs package
implements a set of types that help work with these. In particular,
it defines a
Container
value that allows elements within to refer to other elements.
Network synchronization and server
DOT uses a fairly simple backend
Store
interface: an append-only dumb log. The
Bolt and
Postgres
implementations are quite simple and other data backends can be
easily added.
See Server and Client connection for
sample server and client applications. Note that the journal approach
used implies that the journal size only increases and so clients will
eventually take a while to rebuild their state from the journal. The
client API allows snapshotting state to make the rebuilds faster.
There is no server support for snapshots though it is possible to
build one rather easily
Broad Issues
- changes.Context/changes.Meta are not fully integrated
gob-encoding makes it harder to deal with other languages but JSON
encodindg wont work with interfaces.
- Added
sjson encoding
as a portable (if verbose) format. - The ES6 dotjs package uses this as the native format.
- Cross-object merging and persisted branches need more platform support
- Snapshots are somewhat related to this as well.
- Full rich-text support with collaborative cursors still needs work
with references and reference containers.
- Code generation can infer types from regular go declarations
- Snapshots and transient states need some sugar.
Contributing
Please see CONTRIBUTING.md.