Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@fluidframework/tree

Package Overview
Dependencies
Maintainers
0
Versions
115
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@fluidframework/tree

Distributed tree

  • 2.0.9
  • Source
  • npm
  • Socket score

Version published
Maintainers
0
Created
Source

@fluidframework/tree

A tree data structure for the Fluid Framework.

The contents of this package are also reported as part of the fluid-framework package which provides an alternative way to consume the functionality from this package.

SharedTree Philosophy covers the goals of the SharedTree project, and some of the implications of those goals for developers working on this package.

Status

Notable consideration that early adopters should be wary of:

  • The persisted format is not stable and long term support for persisted format is not yet in effect. This means clients running more recent versions of SharedTree may find themselves unable to load old document or may corrupt them when editing. Current estimate for LTS is April 2024.
  • SharedTree currently has unbounded memory growth:
    • Removed content is retained forever.
    • Accessing an EditableField object (from its parent, e.g., with getField) in a loop will leak unbounded memory.
  • All changes are atomized by the visitDelta function. This means that, when inserting/removing/moving 2 contiguous nodes, the visitDelta function will call the DeltaVisitor twice (once for each node) instead of once for both nodes. Change notification consumers that are downstream from the DeltaVisitor will therefore also see those changes as atomized.
  • Some documentation (such as this readme and the roadmap) are out of date. The API documentation which is derived from the documentation comments in the source code should be more up to date.

More details on the development status of various features can be found in the roadmap.

Using Fluid Framework libraries

When taking a dependency on a Fluid Framework library, we recommend using a ^ (caret) version range, such as ^1.3.4. While Fluid Framework libraries may use different ranges with interdependencies between other Fluid Framework libraries, library consumers should always prefer ^.

Motivation

There are a lot of different factors motivating the creation of this Tree DDS. A wide variety of possible consumers (across several companies) have overlapping feature requirements which seem like they can best be met by collaborating on a single feature rich tree implementation powered by Fluid. The current feature focus is on:

  • Semantics:
    • High quality semantic merges, including moves of parts of sequences (called "slice moves").
    • Transactionality.
    • Support schema in a semantically robust way.
  • Scalability:
    • Support for partial views: allow efficiently viewing and editing parts of larger datasets without downloading the whole thing.
    • Ability to easily (sharing code with client) spin up optional services to improve scalability further (ex: server side summaries, indexing, permissions etc.)
    • Efficient data encodings.
  • Expressiveness:
    • Efficient support for moves, including moves of large sections of large sequences, and large subtrees.
    • Support history operations (ex: undo and redo).
    • Flexible schema system that has design patterns for making schema changes over time.
  • Workflows:
    • Good support for offline.
    • Optional support for branching and history.
  • Extensibility: It must be practical to accommodate future users with needs beyond what we can afford to support in the initial version. This includes needs like:
    • New field kinds to allow data-modeling with more specific merge semantics (ex: adding support for special collections like sets, or sorted sequences)
    • New services (ex: to support permissions, server side indexing etc.)

What's missing from existing DDSes?

directory and map can not provide merge resolution that guarantees well-formedness of trees while supporting the desired editing APIs (like subsequence move), and are missing (and cannot be practically extended to have) efficient ways to handle large data or schema.

sequence does not capture the hierarchy or schema, and also does not handle partial views. Additionally its actual merge resolution leaves some things to be desired in some cases which tree aims to improve on.

experimental/tree does not have a built in schema system reducing the data available to make semantically high quality merges. It also does merge resolution in a way that requires having the whole tree in memory due to it being based entirely on node identifiers (including constraints within transactions that can't be verified without reading large parts of the tree).

experimental/PropertyDDS currently does not have as high quality merge logic as desired, currently not even supporting efficient moves. Much of what is desired is theoretically possible as additional feature work on PropertyDDS, but it was decided that it makes more sense to build up this more featureful DDS from scratch leveraging the learnings from PropertyDDS and experimental/tree.

Why not a tree made out of existing DDS implementations and how this relates to the Fluid Framework itself?

Currently existing DDS implementations can not support cross DDS transactions. For example, moving part of a sequence from one sequence DDS to another cannot be done transactionally, meaning if the source of the move conflicts, the destination half can't be updated or aborted if it's in a different DDS. Cross DDS moves also currently can't be as efficient as moves within a single DDS, and there isn't a good way to do cross DDS history or branching without major framework changes. There are also some significant per DDS performance and storage costs that make this approach much more costly than using a single DDS.

One way to think about this new tree DDS is to try and mix some of the Fluid-Framework features (like the ability to view a subset of the data) with features from DDSes (ex: lower overhead per item, efficient moves of sub-sequences, transactions). If this effort is successful, it might reveal some improved abstractions for modularizing hierarchical collaborative data-structures (perhaps "field kinds"), which could make their way back into the framework, enabling some features specific to this tree (ex: history, branching, transactional moves, reduced overhead) to be framework features instead.

From this perspective, this tree serves as a proof of concept for abstractions and features which could benefit the framework, but are easier to implement within a DDS initially. This tree serves to get these feature into the hands of users much faster than could be done at the framework level.

This package can be developed using any of the regular workflows for working on Fluid Framework and/or its Client release group of packages, but for work only touching the tree package, there is an optional workflow that might be more ergonomic:

  • Follow the Setup and Building instructions.
  • Open the .vscode/Tree.code-workspace in VS Code. This will recommend a test runner extension, which should be installed.
  • Build the the tree package as normal (for run example: pnpm i && pnpm run build in the tree directory).
  • After editing the tree project, run npm run build (still in the tree) directory.
  • Run and debug tests using the "Testing" side panel in VS Code, or using the inline Run | Debug buttons which should show up above tests in the source: both of these are provided by the mocha testing extension thats recommended by the workspace. Note that this does not build the tests, so always be sure to build first.

Architecture

This section covers the internal structure of the Tree DDS. In this section the user of this package is called "the application". "The application" is full of "application code", meaning code which can be specific to particular schema and use-cases. This typically means the client side "business logic" or "view" part of some graphical web application, but it could also mean something headless like a service.

Ownership and Lifetimes

This diagram shows the ownership hierarchy during a transaction with solid arrows, and some important references with dashed arrows:

graph TD;
    subgraph "Persisted Data"
        store["Data Store"]-->doc["Persisted Summaries"]
    end
    container["Fluid Container"]-->shared-tree
    subgraph "@fluidframework/tree"
        shared-tree--"extends"-->shared-tree-core
        shared-tree-core-."reads".->doc
        shared-tree-core-->EditManager-->X["collab window & branches"]
        shared-tree-core-->Indexes-->ForestIndex
        shared-tree-->view["SharedTreeView"]
        transaction-."updates".->view
        transaction-->EditBuilder
        view-."reads".->ForestIndex
        view-->transaction
    end

tree is a DDS, and therefore it stores its persisted data in a Fluid Container, and is also owned by that same container. When nothing in that container references the DDS anymore, it may get garbage collected by the Fluid GC.

The tree DDS itself, or more specifically shared-tree-core is composed of a collection of indexes (just like a database) which contribute data which get persisted as part of the summary in the container. shared-tree-core owns these databases, and is responsible for populating them from summaries and updating them when summarizing.

See indexes and branches for details on how this works with branches.

When applications want access to the tree's data, they do so through an ISharedTreeView which abstracts the indexes into nice application facing APIs. Views may also have state from the application, including:

  • view-schema
  • adapters for out-of-schema data
  • request or hints for what subsets of the tree to keep in memory
  • pending transactions
  • registrations for application callbacks / events.

shared-tree provides a default view which it owns, but applications can create more if desired, which they will own. Since views subscribe to events from shared-tree, explicitly disposing any additionally created ones is required to avoid leaks.

transactions are created by ISharedTreeViews and are currently synchronous. Support for asynchronous transactions, with the application managing the lifetime and ensuring it does not exceed the lifetime of the view, could be added in the future.

Data Flow

Viewing
flowchart LR;
    doc["Persisted Summaries"]--"Summary+Trailing ops"-->shared-tree-core
    subgraph "@fluidframework/tree"
        shared-tree--"configures"-->shared-tree-core
        shared-tree-core--"Summary"-->Indexes--"Summary"-->ForestIndex;
        ForestIndex--"Exposed by"-->ISharedTreeView
    end
    ISharedTreeView--"viewed by"-->app

shared-tree configures shared-tree-core with a set of indexes. shared-tree-core downloads the summary data from the Fluid Container, feeding the summary data (and any future edits) into the indexes. shared-tree then constructs the default view. The application using the shared-tree can get the view from which it can read data (which the view internally gets from the indexes). For any given part of the application this will typically follow one of two patterns:

  • read the tree data as needed to create the view. Register invalidation call backs for when the observed parts of the tree change. When invalidated, reconstruct the invalidated parts of the view by rereading the tree.
  • read the tree data as needed to create the view. Register delta callbacks for when the observed parts of the tree change. When a delta is received, update the view in place according to the delta.

TODO: Eventually these two approaches should be able to be mixed and matched for different parts of the application as desired, receiving scoped deltas. For now deltas are global.

Note that the first pattern is implemented using the second. It works by storing the tree data in a forest which updates itself using deltas. When an application chooses to use the second pattern, it can be thought of as opting into a specialized application (or domain) specific tree representation. From that perspective the first pattern amounts to using the platform-provided general purpose tree representation: this should usually be easier, but may incur some performance overhead in specific cases.

When views want to hold onto part of the tree (for the first pattern), they do so with "anchors" which have well defined behavior across edits.

TODO: Note that as some point the application will want their view-schema applied to the tree from the view. The system for doing this is called "schematize" and is currently not implemented. When it is more designed, some details for how it works belong in this section (as well as the section below).

Editing

Edit related data flow with solid arrows. Key view related updates made in response with dotted arrows.

This shows editing during a transaction:

flowchart RL
    subgraph "@fluidframework/tree"
        transaction--"collects edits in"-->EditBuilder
        EditBuilder--"updates anchors"-->AnchorSet
        EditBuilder--"deltas for edits"-->transaction
        transaction--"applies deltas to"-->forest["ISharedTreeView's forest"]
    end
    command["App's command callback"]
    command--"Edits"-->transaction
    forest-."invalidation".->command

The application can use their view to locate places they want to edit. The application passes a "command" to the view which create a transaction that runs the command. This "command" can interactively edit the tree. Internally the transaction implements these edits by creating changes. Each change is processed in two ways:

  • the change is converted to a delta which is applied to the forest and any existing anchors allowing the application to read the updated tree afterwards.
  • the changes applied to the EditBuilder are accumulated and used to create/encode the actual edit to send to Fluid.

Once the command ends, the transaction is rolled back leaving the forest in a clean state. Then if the command did not error, a changeset is created from the changes applied to the EditBuilder, which is encoded into a Fluid Op. The view then rebases the op if any Ops came in while the transaction was pending (only possible for async transactions or if the view was behind due to it being async for some reason). Finally the view sends the op to shared-tree-core which submits it to Fluid. This submission results in the op becoming a local op, which shared-tree-core creates a delta for. This delta goes to the indexes, resulting in the ForestIndex and thus views getting updated, as well as anything else subscribing to deltas.

This shows completion of a transaction. Not shown are the rollback or changes to forest (and the resulting invalidation) and AnchorSet, then the updating of them with the final version of the edit. In the common case this can be skipped (since they cancel out). Also not shown is the (also usually unneeded) step of rebasing the changeset before storing it and sending it to the service.

flowchart LR
    command["App's command callback"]--"commit"-->transaction
    subgraph "@fluidframework/tree"
        transaction--"build"-->EditBuilder
        EditBuilder--"changeset"-->transaction
        transaction--"changeset (from builder)"-->core["shared-tree-core"]
        core--"changeset"-->EditManager--"changeset"-->local["Local Branch"]
    end
    core--"Op"-->service["Fluid ordering service (Kafka)"]
    service--"Sequenced Op"-->clients["All clients"]
    service--"Sequenced Op"-->log["Op Log"]

When the op gets sequenced, shared-tree-core receives it back from the ordering service, rebases it as needed, and sends another delta to the indexes.

graph LR;
    service["Fluid Service"]--"Sequenced Op"-->core["shared-tree-core"]
    subgraph "@fluidframework/tree"
        core--"changeset"-->EditManager
        EditManager--"add changeset"-->remote["remote branch"]
        remote--"rebase into"-->main[main branch]
        main--"rebase over new changeset"-->local["Local Branch"]
        main--"sequenced changeset"-->Indexes
        local--"delta"-->Indexes
        Indexes--"delta"-->ForestIndex
    end
    ForestIndex--"invalidates"-->app
    Indexes--"delta (for apps that want deltas)"-->app

Schema Evolvability

Over time, application authors may want to change the schema for their documents. For example, they might want to add support for a new application feature or represent existing content in some new way.

Before doing so, application authors must consider compatibility constraints within their ecosystem. Most ecosystems don't have a way to ensure all documents an application may open are using the new schema or even that all users within a collaborative session are using the same code version. This can be problematic when two clients using code versions with different document schema attempt to collaborate.

As a result, applications must be forward-thinking about policies around when their code supports working with some particular document.

See Schema Evolution for a comprehensive treatment of this problem.

Dependencies

@fluidframework/tree depends on the Fluid runtime (various packages in @fluidframework/*) and will be depended on directly by application using it (though at that time it will be moved out of @fluid-experimental). @fluidframework/tree is also complex, so its implementation is broken up into several parts which have carefully controlled dependencies to help ensure the codebase is maintainable. The goal of this internal structuring is to make evolution and maintenance easy. Some of the principles used to guide this are:

  • Avoid cyclic dependencies:

    Cyclic dependencies can make it hard to learn a codebase incrementally, as well as make it hard to update or replace parts of the codebase incrementally. Additionally they can cause runtime issues with initialization.

  • Minimize coupling:

    Reducing the number and complexity of edges in the dependency graph. This often involves approaches like making a component generic instead of depending on a concrete type directly, or combining related components that have a lot of coupling.

  • Reducing transitive dependencies:

    Try to keep the total number of dependencies of a given component small when possible. This applies both at the module level, but also for the actual object defined by those modules. One particular kind of dependency we make a particular effort to avoid are dependencies on stateful systems from code that has complex conditional logic. One example of this is in rebase where we ensured that the stateful system, Rebaser is not depended on by the actual change specific rebase policy. Instead the actual replace policy logic for changes is behind the ChangeRebaser interface, which does not depend on Rebaser and exposes the policy as pure functions (and thus is stateless). This is important for testability, since complex conditional logic (like ChangeRebaser implementations) require extensive unit testing, which is very difficult (and often slow) for stateful systems and systems with lots of dependencies. If we instead took the pattern of putting the change rebasing policy in Rebaser subclasses, this would violate this guiding principle and result in much harder to isolate and test policy logic.

    Another aspect of reducing transitive dependencies is reducing the required dependencies for particular scenarios. This means factoring out code that is not always required (such as support for extra features and optimizations) such that they can be omitted when not needed. shared-tree-core is an excellent example of this: it can be run with no indexes, and trivial a change family allowing it to have very few required dependencies. This often takes the form of either depending on interfaces (which can have their implementation swapped out or mocked), like ChangeFamily, or collection functionality in a registry, like we do for FieldKinds and shared-tree-core's indexes. Dependency injection is one example of a useful pattern for reducing transitive dependencies. In addition to simplifying reasoning about the system (less total to think about for a given scenario) and simplifying testing, this approach also makes the lifecycle for new features easier to manage, since they can be fully implemented and tested without having to modify code outside of themselves. This makes pre-releases, stabilization and eventual deprecation of these features much easier, and even makes publishing them from separate packages possible if it ends up needing an even more separated lifecycle.

    Additionally, this architectural approach can lead to smaller applications by not pulling in unneeded functionality.

These approaches have led to a dependency structure that looks roughly like the diagram below. In this diagram, some dependency arrows for dependencies which are already included transitively are omitted.

flowchart
    direction TB
    subgraph package ["@fluidframework/tree"]
        direction TB
        subgraph core ["core libraries"]
            direction TB
            schema-view
            forest-->schema-stored
            rebase-->tree
            schema-stored-->dependency-tracking
            schema-view-->schema-stored
            dependency-tracking
            forest-->tree
            revertible-->rebase
        end
        core-->events-->util
        core-->id-compressor-->util
        core-->codec-->util
        feature-->shared-tree-core
        shared-tree-core-->core
        shared-tree-->simple-tree
        simple-tree-->feature
        external-utilities-->feature
        subgraph feature ["feature-libraries"]
            direction TB
            flex-tree-->contextuallyTyped
            flex-tree-->node-key
            defaultRebaser
            contextuallyTyped-->defaultFieldKinds
            defaultSchema-->defaultFieldKinds-->modular-schema
            forestIndex-->treeTextCursor
            modular-schema
            node-key-->modular-schema
            node-key-->defaultFieldKinds
            object-forest-->mapTreeCursor-->treeCursorUtils
            chunked-forest-->treeCursorUtils
            schemaIndex
            sequence-change-family-->treeTextCursor
        end
        subgraph domains
            JSON
        end
        domains-->simple-tree
    end
    package-->runtime["Fluid runtime"]

Open Design Questions

The design issues here all impact the architectural role of top-level modules in this package in a way that when fixed will likely require changes to the architectural details covered above. Smaller scoped issues which will not impact the overall architecture should be documented in more localized locations.

How should specialized sub-tree handling compose?

Applications should have a domain model that can mix editable tree nodes with custom implementations as needed. Custom implementations should probably be able to be projections of editable trees, the forest content (via cursors), and updated via either regeneration from the input, or updated by a delta. This is important for performance/scalability and might be how we do virtualization (maybe subtrees that aren't downloaded are just one custom representation?). This might also be the layer at which we hook up schematize. Alternatively, it might be an explicitly two-phase setup (schematize then normalize), but we might share logic between the two and have non-copying bypasses.

How all this relates to dependency-tracking is to be determined.

FAQs

Package last updated on 26 Sep 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc