@fluidframework/sequence
Using Fluid Framework libraries
When taking a dependency on a Fluid Framework library, we recommend using a ^
(caret) version range, such as ^1.3.4
.
While Fluid Framework libraries may use different ranges with interdependencies between other Fluid Framework libraries,
library consumers should always prefer ^
.
Description
The @fluidframework/sequence package supports distributed data structures which are list-like.
Its main export is SharedString, a DDS for storing and simultaneously editing a sequence of text.
Note that SharedString is a sequence DDS but it has additional specialized features and behaviors for working with text.
This package historically contained several other sequence-based DDSes, but because they have unintuitive behaviors,
they are deprecated and being moved to the experimental folder.
The main reason for this is the lack of move semantics within the sequence, which becomes crucial when dealing with sequences of
complex content.
For that reason, all of the examples in this README use SharedString
. However, the APIs discussed are available on the common base class: SharedSegmentSequence
.
For the remainder of this document, the term sequence will refer to this base class.
Items are the individual units that are stored within the sequence (e.g. in a SharedString, the items are characters),
but regardless of the type of data stored in the sequence, every item in a sequence is at a specific position starting
at 0, similar to an array. However, sequences differ from arrays in that the positions can move as local and remote
editors make modifications to the sequence.
As its name suggests, SharedSegmentSequence is composed of segments. Segments are the unit that the sequence works
with internally, and contain items within them. Thus, every segment has a length of at least 1 -- that is, it contains
at least one item -- and segments may be split and merged arbitrarily as the sequence is edited. This means the length
of the sequence is not the number of segments, but rather the sum of the length of all the segments.
For example, consider a SharedString that is initially empty. User A adds the characters a, b, and c to the
sequence. Its length is now 3 -- it contains 3 items. Internally, however, the sequence could have either 1, 2, or 3
segments.
Segments: [S1] [S2] [S3]
Items: a b c
Segments: [ S1 ] [S2]
Items: a b c
Segments: [ S1 ]
Items: a b c
In typical use, the splitting and merging of segments is an implementation detail that is not relevant to using the
sequence. However, it is possible to enumerate the segments that intersect a range of positions for performance reasons.
In this case it is important to not retain references to the segments (outside of the enumeration), and to make no
assumptions based on the length of the segments themselves.
Using a Sequence
Sequences support three basic operations: insert, remove, and annotate.
Insert and remove are used to add and remove items from the sequence, while annotate is used to add metadata to items.
Notably, sequences do not support a notion of "moving" a range of content.
If "move" semantics are a hard requirement for your scenario, this github issue outlines some reasonable alternatives.
Insert
Insert operations on the sequence take a single position argument along with the content. This position is inclusive and
can be any position in the sequence including 0, to insert at the beginning of the sequence, and the length of the
sequence, to insert at the end.
sharedString.insertText(0, "hi");
sharedString.insertText(sharedString.getLength(), "!");
sharedString.insertText(2, " world");
Remove
Remove operations take a start and an end position, referred to as a range. The start position is inclusive and can be
any position in the sequence from 0 to its length - 1
. The start position cannot be the length of the sequence like it
can in insert, because there is nothing at that position. The end position is exclusive and must be greater than the
start, so it can be any value from 1 to n (where n is the length of the sequence).
sharedString.removeRange(0, 3);
sharedString.removeRange(0, sharedString.getLength());
Annotate
Annotate operations can add or remove map-like properties to or from items in the sequence. They can store any JSON
serializable data and have the same merge behavior as a SharedMap (last writer wins). Annotate takes a start and end
position which work the same way as the start and end of the remove operation. In addition to start and end, annotate
also takes a map-like properties object. Each key of the provided properties object will be set on each position of the
specified range. Setting a property key to null will remove that property from the positions in the range.
let props1 = sharedString.getPropertiesAtPosition(1);
let props5 = sharedString.getPropertiesAtPosition(5);
sharedString.annotateRange(0, 2, { weight: 5 });
props1 = sharedString.getPropertiesAtPosition(1);
props5 = sharedString.getPropertiesAtPosition(5);
sharedString.annotateRange(0, sharedString.getLength(), { decoration: "underline" });
props1 = sharedString.getPropertiesAtPosition(1);
props5 = sharedString.getPropertiesAtPosition(5);
sharedString.annotateRange(0, sharedString.getLength(), { weight: null });
props1 = sharedString.getPropertiesAtPosition(1);
props5 = sharedString.getPropertiesAtPosition(5);
Sequence delta event
Whenever an operation is performed on a sequence a sequenceDelta event will be raised. This event provides the ranges
affected by the operation, the type of the operation, and the properties that were changed by the operation.
sharedString.on("sequenceDelta", ({ deltaOperation, ranges, isLocal }) => {
if (isLocal) {
addOperationToUndoStack(deltaOperation, ranges);
}
if (deltaOperation === MergeTreeDeltaType.INSERT) {
syncInsertSegmentToModel(deltaOperation, ranges);
}
});
Internally, the sequence package depends on @fluidframework/merge-tree
, and also raises MergeTreeMaintenance
events on that tree as maintenance events.
These events don't correspond directly to APIs invoked on a sequence DDS, but may be useful for advanced users.
Maintenance events are raised whenever the underlying structure of the merge-tree changes (segments are merged, split, unlinked, etc),
so applications attempting to synchronize a data model dependent on the segment structure of merge-tree should look into the semantics of these events; see MergeTreeMaintenanceType
.
Both sequenceDelta and maintenance events are commonly used to synchronize or invalidate a view an application might have over a backing sequence DDS.
Sequence merge strategy
The Fluid sequence data structures are eventually consistent, which means all editors will end up in the same
final state. However, the intermediate states seen by each collaborator may not be seen by other collaborators. These
intermediate states occur when two or more collaborators modify the same position in the sequence which results in a
conflict.
Merge strategy for insert
Consider a sequence like this:
// content: hi mar
// positions: 012345
Now two users simultaneously insert characters at the end of the sequence. One inserts k
and the other inserts a c
.
This is an insert conflict. The basic strategy for insert conflict resolution in the sequence is to merge far,
closer to the end of the sequence.
This merge strategy is possible because of a fundamental property of the Fluid Framework, which is guaranteed ordering.
That is, while the two inserts occurred simultaneously, the operations will be given a global order and all clients will
see the order of the operations when applying them locally. This enables each client to converge to the same state
eventually.
In the earlier example, assuming the k
operation was ordered before the c
operation, then the k
would be
inserted at position 6 first. Then the c
op is applied -- this is the merge conflict. The c
op is inserted at the
position requested (6), and the k
is pushed out towards the end of the sequence.
// content: hi mar
// positions: 012345
// insert(6, "k")
// k op is ordered first
// content: hi mark
// positions: 0123456
// insert(6, "c")
// c op is now applied, pushing the k towards the end of the sequence
// content: hi marck
// positions: 01234567
This same logic applies if multiple items are inserted at the same position -- the earlier ordered items will be pushed
towards the end of the sequence as the later items are merged.
Merge strategies for remove
Like insert, the strategies for remove and annotate also use the guaranteed ordering provided by the Fluid Framework.
Consider again the example from above. Now one user inserts a y
at position 6, and another user removes the c
and
the k
(positions 6 and 7).
// content: hi marck
// positions: 01234567
// REMOVE BEFORE INSERT
// remove(6, 7)
// remove op now applied
// content: hi mar
// positions: 012345
// insert(6, "y")
// no merge conflict -- position 6 is empty
// content: hi mary
// positions: 0123456
// OR
// INSERT BEFORE REMOVE
// insert(6, "y")
// y op is now applied, pushing the c and k towards the end of the sequence
// content: hi maryck
// positions: 012345678
// remove(6, 7)
// remove op now applied, but only removes content ordered before it
// content: hi mary
// positions: 0123456
The key to this merge behavior is that a remove operation will only remove content that was visible to it when the
operation was made. In the example above, the remove op adjusted the range it removed, ensuring only the ck
was
removed.
Another way to consider this behavior is that a remove operation will only remove content that was inserted earlier in
the order. Anything inserted after a remove operation will be ignored. The sequence also detects overlapping remove
operations, and the merge resolution is straightforward -- the data is removed.
Merge strategy for annotate
As mentioned above, annotate operations behave like operations on SharedMaps. The merge strategy used is last writer
wins. If two collaborators set the same key on the annotate properties the operation that gets ordered last will
determine the value.
Local references
Sequences support addition and manipulation of local references to locally track positions in the sequence over time.
As the name suggests, any created references will only exist locally; other clients will not see them.
This can be used to implement user interactions with sequence data in a way that is robust to concurrent editing.
For example, consider a text editor which tracks a user's cursor state.
The application can store a local reference to the character after the cursor position:
const { segment, offset } = sharedString.getContainingSegment(5);
const cursor = sharedString.createLocalReferencePosition(
segment,
offset,
ReferenceType.SlideOnRemove,
{ cursorColor: "blue" },
);
const pos = sharedString.localReferencePositionToPosition(cursor);
otherSharedString.replaceText(1, 2, "ello");
const pos = sharedString.localReferencePositionToPosition(cursor);
Notice that even though another client concurrently edited the string, the local reference representing the cursor is still in the correct location with no further work for the API consumer.
The ReferenceType.SlideOnRemove
parameter changes what happens when the segment that reference is associated with is removed.
SlideOnRemove
instructs the sequence to attempt to slide the reference to the start of the next furthest segment, or if no such segment exists (i.e. the end of the string has been removed), the end of the next nearest one.
The webflow example demonstrates this idea in more detail.
Unlike segments, it is safe to persist local references in auxiliary data structures, such as an undo-redo stack.
Interval collections
Sequences support creation of interval collections, an auxiliary collection of intervals associated with positions in the sequence.
Like segments, intervals support adding arbitrary properties, including handles (references) to other DDSes.
The interval collection implementation uses local references, and so benefits from all of the robustness to concurrent editing
described in the previous section.
Unlike local references, operations on interval collections are sent to all clients and updated in an eventually consistent way.
This makes them suitable for implementing features like comment threads on a text-based documents.
The following example illustrates these properties and highlights the major APIs supported by IntervalCollection.
const comments = sharedString.getIntervalCollection("comments");
const comment = comments.add(
3,
8,
IntervalType.SlideOnRemove,
{
creator: "my-user-id",
handle: myCommentThreadDDS.handle,
},
);
const allIntervalsInCollection = Array.from(comments);
const allProperties = comments.map((comment) => comment.properties);
const intervalsOverlappingFirstHalf = comments.findOverlappingIntervals(0, 4);
const startPosition = sharedString.localReferencePositionToPosition(comment.start);
const endPosition = sharedString.localReferencePositionToPosition(comment.end);
comments.change(comment.getIntervalId(), 0, 1);
comments.changeProperties(comment.getIntervalId(), { status: "resolved" });
comments.removeIntervalById(comment.getIntervalId());
Interval stickiness
"Stickiness" refers to the behavior of intervals when text is inserted on either
side of the interval. A "sticky" interval is one which expands to include text
inserted directly adjacent to it.
A "start sticky" interval is one which expands only to include text inserted to
the start of it. An "end sticky" interval is the same, but with regard to text
inserted adjacent to the end.
For example, let's look at the string "abc". If we have an interval on the
character "b", what happens when we insert text on either side of it? In the
below diagrams, we represent an interval by putting a caret directly underneath
the characters it contains.
Example
Original string
abc
^
No stickiness
aXbYc
^
The interval does not expand to include the newly inserted characters X
and Y
.
Start stickiness
aXbYc
^^
End stickiness
aXbYc
^^
Full stickiness
aXbYc
^^^
Concrete Implementation
The above is a description of the abstract semantics of the concept of stickiness.
In practice, this is implemented using the concept of "sides."
For a given sequence of N characters, there are 2N + 2 positions which can be
referenced: the position immediately before and after each character, and two
special endpoint segments denoting the position before and after the start and
end of the sequence respectively.
By placing the endpoints of an interval either before or after a character, it
is possible to make the endpoints inclusive or exclusive. An exclusive endpoint
in a given direction implies stickiness in that direction. Whether an endpoint
is inclusive or exclusive depends on both the Side and if it is the start or the
end.
Given the string "ABCD":
collection.add(
{ pos: 0, side: Side.After },
{ pos: 3, side: Side.Before },
IntervalType.SlideOnRemove,
);
collection.add(0, 3, IntervalType.SlideOnRemove);
In the case of the first interval shown, if text is deleted,
string.removeRange(1, 2);
The start point of the interval will slide to the position immediately before
"C", and the same will be true.
{start} - A[- C -]D - {end}
In this case, text inserted immediately before "C" would be included in the
interval.
string.insertText(1, "EFG");
With the string now being,
{start} - A[- E - F - G - C -]D - {end}
Note that the endpoint continues to remain with the associated character, except
when the character is removed. When content containing endpoints is removed,
After
endpoints move backward and Before
endpoints move forward to maintain their
side value and inclusive/exclusive behavior.
SharedString
SharedString is a specialized data structure for handling collaborative text. It is based on a more general
Sequence data structure but has additional features that make working with text easier.
In addition to text, a SharedString can also contain markers.
Markers can be used to store metadata at positions within the text, like a reference to an image or Fluid object that should be rendered with the text.
Both markers and text are stored as segments in the SharedString.
Text segments will be split and merged when modifications are made to the SharedString and will therefore have variable length
matching the length of the text content they contain.
Marker segments are never split or merged, and always have a length of 1.
The length of the SharedString will be the combined length of all the text and marker segments.
Just like with other sequences, when talking about positions in a SharedString we use the terms near and far.
The nearest position in a SharedString is 0, and the farthest position is its length.
When comparing two positions the nearer positions is closer to 0, and the farther position is closer to the length.
Intervals vs. markers
Interval endpoints and markers both implement ReferencePosition and seem to serve a similar function so it's not obvious how they differ and why you would choose one or the other.
Using the interval collection API has two main benefits:
-
Efficient spatial querying
- Interval collections support iterating all intervals overlapping the region
[start, end]
in O(log N) + O(overlap size)
time, where N
is the total number of intervals in the collection.
This may be critical for applications that display only a small view of the document contents.
On the other hand, using markers to implement intervals would require a linear scan from the start or end of the sequence to determine which intervals overlap.
-
More ergonomic modification APIs
- Interval collections natively support a modify operation on the intervals, which allows moving the endpoints of the interval to a different place in the sequence.
This operation is atomic, whereas with markers one would have to submit a delete operation for the existing position and an insert for the new one.
In order to achieve the same atomicity, those operations would need to leverage the
SharedSegmentSequence.groupOperation
API,
which is less user-friendly.
If the ops were submitted using standard insert and delete APIs instead, there would be some potential for data loss if the delete
operation ended up acknowledged by the server but the insert operation did not.
Attribution
Important: Attribution is currently in alpha development and is marked internal: expect breaking changes in minor releases as we get feedback on the API shape.
SharedString supports storing information attributing each character position to the user who inserted it and the time at which that insertion happened.
This functionality is off by default.
To enable this functionality, first ensure that all clients are created with an attribution policy factory in the loader settings:
import { createInsertOnlyAttributionPolicy } from "@fluidframework/merge-tree";
const options: ILoaderOptions = {
attribution: {
policyFactory: createInsertOnlyAttributionPolicy,
},
};
This ensures that the client is able to load existing documents containing attribution information,
and specifies which kinds of operations should be attributed at the SharedString level.
The stored attribution information can be found on the attribution
field of the SharedString's segments.
To attribute property changes as well as insertions, use createPropertyTrackingAndInsertionAttributionPolicyFactory
in place of the insert-only factory.
Next, enable the "Fluid.Attribution.EnableOnNewFile"
config flag to start tracking attribution information for new files.
const { segment, offset } = sharedString.getContainingSegment(5);
const key = segment.attribution.getAtOffset(offset);
Note that because attribution information is only finalized upon receiving acknowledgement from the server,
any queries for attribution keys on unacked changes will return LocalAttributionKey
.
To listen for changes to attribution information (e.g. to synchronize a data model with a SharedString
),
use the "maintenance" event for acknowledgement.
For further reading on attribution, see the @fluid-experimental/attributor README.
Examples
-
Rich Text Editor Implementations
-
Integrations with Open Source Rich Text Editors
-
Plain Text Editor Implementations