Socket
Book a DemoInstallSign in
Socket

@fluidframework/id-compressor

Package Overview
Dependencies
Maintainers
1
Versions
166
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@fluidframework/id-compressor

ID compressor

2.52.0
Source
npmnpm
Version published
Weekly downloads
4K
4.28%
Maintainers
1
Weekly downloads
 
Created
Source

@fluidframework/id-compressor

A library which generates small number representations of arbitrary non-colliding Version 4 UUIDs ("stable IDs") across multiple sessions in a network. This scheme enables a distributed application to utilize the global uniqueness guarantees of UUIDs while maintaining the performance advantages of small integers.

Using Fluid Framework libraries

When taking a dependency on a Fluid Framework library's public APIs, we recommend using a ^ (caret) version range, such as ^1.3.4. While Fluid Framework libraries may use different ranges with interdependencies between other Fluid Framework libraries, library consumers should always prefer ^.

If using any of Fluid Framework's unstable APIs (for example, its beta APIs), we recommend using a more constrained version range, such as ~.

Installation

To get started, install the package by running the following command:

npm i @fluidframework/id-compressor

Importing from this package

This package leverages package.json exports to separate its APIs by support level. For more information on the related support guarantees, see API Support Levels.

To access the public (SemVer) APIs, import via @fluidframework/id-compressor like normal.

To access the legacy APIs, import via @fluidframework/id-compressor/legacy.

API Documentation

API documentation for @fluidframework/id-compressor is available at https://fluidframework.com/docs/apis/id-compressor.

Introduction

The Fluid Framework ID Compressor (package @fluidframework/id-compressor) provides a distributed unique ID generation and compression scheme. It allows multiple clients (“sessions”) to generate stable IDs (v4 UUIDs) that are globally unique, while representing them as small integers locally. This yields significant benefits in memory and storage efficiency and faster lookups (e.g., using compressed IDs as keys) compared to using full UUID strings. The compressor works in tandem with a centralized ordering service (the Fluid runtime’s total order broadcast) to assign each generated ID a final numeric value that is unique within a shared document (collaboration session).

Each client session uses the compressor to synchronously generate IDs without coordination, possibly using a temporary session-local form (negative integer) that is only unique within that client. Later, when the operation defining the ID is sequenced by the service, the ID is assigned a permanent final form (a non-negative integer) that is globally unique within the document. An ID’s numeric sign (negative or non-negative) at creation depends on whether a final value was immediately assigned (which in turn depends on the state of the allocator at that moment) – once created, an ID’s form remains stable within that session’s context and does not change implicitly. The compressor’s API provides explicit normalization functions to convert IDs between the session space (local view) and op space (final view), as needed.

Key Concepts

Sessions and Documents

A session refers to a unique identifier for an individual ID compressor (typically one client or one component of the app). Each session has its own local ID sequence. Even if a compressor is saved and restored (e.g., for offline use), as long as it retains the same session identifier, it is considered the same logical session. All compressors operating on the same collaborative document share the same ID space after finalization, but before finalization they may use overlapping local IDs distinguished by session.

A document in this context is the collection of all IDs generated by all sessions collaborating on a particular Fluid container. Compressed IDs are only unique within the scope of a single document. (In their full UUID form, they remain globally unique, but the small integer representations are not intended to be compared or merged across different documents.)

Stable IDs and Compressed IDs

A stable ID is the full 128-bit UUID string that serves as the true globally unique identifier. The ID Compressor generates stable IDs in a deterministic, low-entropy way: each session chooses a random 128-bit starting UUID when it begins, and thereafter generates subsequent UUIDs by incrementing a counter (essentially producing sequential UUIDs). This strategy greatly reduces entropy in the generated IDs (making them more compressible and easier to store) while improving collision resistance relative to fully random generation. In fact, by allocating each session a distinct block of the UUID space starting at a random base, the probability of any two sessions ever producing the same UUID is astronomically low – even lower than if all IDs were purely random – since only the initial UUID selection is random and subsequent IDs from the same session are guaranteed unique increments.

A compressed ID is the small integer representation of a stable ID used within the document. The compressor deals with two kinds of compressed IDs:

  • Session-space IDs: An ID in session space is in its most local form. For the session that created the ID, this means if the ID has not been finalized yet it remains in a local form (denoted by a negative number in the current implementation), and if it was created with a final assignment, it will be that final number (non-negative). For any other session, a session-space ID for an ID they didn’t create would appear as a final number or not exist at all in their local space. In short, in session space each compressor sees its own IDs in local form when possible. Application code should use session-space IDs when referring to IDs internally, since these IDs are stable for the lifetime of that compressor and avoid confusion between local vs final forms.
  • Op-space IDs: An ID in op space is normalized to the most final form known. This means that any ID which has a final (document-wide) assignment will be represented by that final (non-negative) number. Only IDs that are not yet sequenced will appear in their session-local form (and those are only meaningful if accompanied by the session that created them). Op-space IDs are used when serializing IDs for communication (ops) or persistence, as they are as close to final as possible and often require minimal or no further normalization on the receiving side. However, op-space IDs may still include un-finalized (session-scoped) IDs if a client sent them before those were ordered; therefore, any op containing op-space IDs must indicate the originating session so that receivers can interpret those IDs correctly.

Important: The compressor’s implementation uses negative integers to represent session-local IDs and non-negative integers to represent final IDs. For example, when a client generates a new ID that has not yet been ordered by the server, it might be represented as –1, –2, etc. within that session. Once finalized, that ID will have a non-negative integer (e.g., 0, 1, 42, etc.) as its final form. A given compressor will always refer to that ID by the same numeric value in session space (e.g., –2) for its lifetime – it will not implicitly “flip” to the final number. To avoid bugs, one should not mix the two spaces without normalization: e.g., do not use an op space ID to index a data structure keyed by session space IDs (and indeed never use op space IDs for anything other than serialized transportation). The compressor provides normalization APIs to convert between these spaces when needed.

ID Allocation and Lifecycle State Machine

At a high level, the ID Compressor partitions the infinite universe of small integers among all sessions via clusters of IDs, allowing each session to carve out contiguous blocks of the positive integer space for its own use. This partitioning is dynamic and driven by the sequencing of operations. Each session can generate IDs independently using negative placeholders, which enables immediate, synchronous creation of new IDs without waiting for coordination. Later, when the server assigns an order, the session is awarded a block of positive IDs (a cluster) to cover those creations. Subsequent creations by that session can then directly use those reserved positive IDs (no placeholder needed) until the block is exhausted. In this way, sessions alternate between using pre-allocated final IDs (when available) and using local negative IDs (when they run out of pre-allocated space), depending on the timing of acknowledgements.

State Machine Overview: Each ID generated by the system goes through the following states or transitions:

  • Generated (Session-Local): When a new ID is created via generateCompressedId() on a compressor, it belongs to that compressor’s local session. If the session currently has no reserved final IDs available, the new ID will be assigned a session-local (negative) ID. This means the ID is unique only in the context of that session and awaits finalization. If instead the session already has a cluster of final IDs reserved (from a previously finalized range) with unused capacity, the compressor will immediately assign the new ID the next available final (non-negative) ID in that cluster. In this case, the ID’s final value is known a priori. The compressor can do this safely because the cluster was obtained via a prior operation that will be sequenced before any operation that uses these final IDs, preserving consistency.

  • Sequenced (Finalizing): After one or more IDs are generated (some possibly in local form), the client’s next operation will include the creation range of new IDs. The runtime internally calls takeNextCreationRange() to obtain a batch of newly generated IDs and their details. This range, containing the session’s ID context (session ID and a contiguous generation index range, plus a hint for cluster size), is sent to the server and then relayed to all clients. It’s crucial that ranges are sent in the order they were taken from the compressor to maintain consistency. At the point of sequencing, the ID Compressor on each client finalizes that range of IDs via finalizeCreationRange(). Finalization allocates a block of final ID numbers (a cluster) for the session if one doesn’t already exist, or extends an existing cluster if possible, and maps each session-local ID in the range to a specific final ID. After this, those IDs have a permanent document-unique numeric identity.

  • Final (Document-Unique): Once finalized, an ID’s final compressed ID is known to all participants and will be used in all contexts requiring document-wide uniqueness (e.g., in ops, storage, other clients’ local state). The ID Compressor guarantees that a given session-local ID always maps to the same final ID on all clients. It also guarantees that the stable UUID associated with the ID remains the same throughout. In other words, after finalization, everyone agrees on the mapping: e.g., session X’s local ID –1 corresponds to final ID 100 (which in turn corresponds to a particular UUID). The local session that created the ID can still refer to it by –1 in session space, but it could also obtain 100 by normalizing to op space. Other sessions will refer to it as 100 in both session and op space (since for them it’s not a local ID at all, but a finalized foreign ID).

Because each session reserves contiguous clusters of the final ID space when finalizing ranges, the final IDs produced by different sessions are sharded (partitioned) across the number line. For example, session A might get a cluster of final IDs [0..99] for its first 100 IDs, session B then gets [100..199] for its IDs, etc., depending on ordering. These assignments are monotonic; no two sessions will ever produce the same final ID, and each cluster is owned by one session. Over time, a session may have multiple clusters (if it generated IDs at different times interleaved with other sessions’ generations), forming that session’s cluster chain.

Cluster Chains: A cluster is a contiguous range of final IDs reserved for a session, represented by a base final ID and a capacity (number of IDs in that range). A session’s cluster chain is the sequence of all clusters assigned to that session in the order they were created (which is also ascending order of final IDs for that session). The cluster chain enables efficient mapping from a session-local ID to the final ID: within one cluster, the mapping is an offset. For instance, if session A’s cluster has base final ID 0 and capacity 5, and it’s the first cluster (hence base local ID –1), then A’s local IDs -1, -2, -3, -4, -5 correspond to final IDs 0, 1, 2, 3, 4 respectively. If session A later gets a second cluster with base final ID 10 and capacity 5, with base local ID –6 (meaning A had generated 5 IDs prior), then A’s -6, -7, -8, -9, -10 will map to final IDs 10, 11, 12, 13, 14 respectively. Each cluster thus stores only two integers – its starting final ID and its size – which is a very compact representation. Given a session-local ID, the compressor can determine which cluster’s range it falls into (via binary search on the cluster chain) and compute the final ID by offset, or determine that it’s not finalized yet if it falls beyond the last finalized local. This cluster structure is central to the compressor’s space efficiency and speed.

Why Clusters? The use of cluster chains serves multiple purposes:

  • Uniqueness and Sharding: By allocating disjoint blocks of the final ID space to each session, the scheme avoids per-ID coordination – sessions don’t step on each other’s IDs because each session uses a different numeric range. Final IDs are essentially sharded by session ownership.
  • Compact Mapping: Instead of storing a mapping for each ID, the compressor only needs to store cluster descriptors. If a session generates a large sequence of IDs, they will likely fall into a few contiguous clusters. Each cluster covers many IDs, so the memory overhead is low (two numbers per cluster).
  • Local Generation without Penalty: Negative session-local IDs allow an app to mint an unbounded number of new IDs immediately. The cluster mechanism ensures that once those IDs are acknowledged, they get folded into a contiguous block of the final number line. The cost of that finalization step is the same regardless of how many IDs were created.
  • Eager Finalization: If a session already has an allocated cluster with free slots, new IDs can skip the placeholder stage entirely – they are eagerly assigned final IDs at creation. In the cluster chain, such IDs are marked as final from the start (the compressor knows “ID number N for this session will ultimately be final X” without waiting). This greatly improves efficiency for bursty ID generation, because after one round trip to obtain a cluster, the next many IDs incur no additional coordination cost.

Example – ID Generation Workflow: Consider a simple scenario with two client sessions (A and B) to illustrate local and final ID assignment:

  • Step 1: Session A creates its first ID by calling generateCompressedId(). No cluster exists yet for A, so the ID is assigned a session-local ID -1. Similarly, A creates a second ID, getting -2. These negative IDs are unique within A, but session B could concurrently also create an ID -1 in its own space (no conflict because B is a different session).

  • Step 2: Session A prepares an op to announce its new IDs. The Fluid runtime calls takeNextCreationRange() on A’s compressor, which returns a range describing the two new IDs (session A’s ID, first generation count = 1, count = 2, etc.). This is included in A’s op and sent to the server. The server sequences this op (ensuring all clients process it in the same order).

  • Step 3: When A’s op is sequenced, each client (including A itself) calls finalizeCreationRange() with that range. The ID Compressor on each side allocates A’s first cluster. Suppose the implementation chooses a cluster of size, say, 5 for A. If no final IDs were yet allocated in the document, A’s cluster might start at final ID 0. The cluster is recorded as belonging to session A with base final ID 0 and capacity 5. Now the two IDs are finalized: A’s -1 → 0 and -2 → 1. The cluster still has 3 unused slots (final IDs 2, 3, 4) reserved for future IDs from A.

  • Step 4: Session A generates a third ID. Now A’s compressor finds an existing cluster with free space (3 slots remain). Therefore, it eagerly assigns the next final ID: the new ID is given final ID 2 immediately (instead of a negative number). Likewise, a fourth and fifth ID from A would get final IDs 3 and 4. At this point, A’s cluster has been filled up (all 5 of its reserved IDs 0–4 are used). Session A’s compressor has not needed to send another range yet because these were already reserved in the first cluster.

  • Step 5: Meanwhile, suppose Session B also generated some IDs and got its first cluster when its op was sequenced. Perhaps B’s first cluster starts at final ID 5 with capacity 5 (covering 5–9). If B created one ID and sent it, B’s -1 would map to final 5, and B now has final IDs 6–9 in reserve for more creations.

  • Step 6: Session A now generates a sixth ID. A’s first cluster is exhausted, and because another session (B) has since allocated the next chunk of the ID space (up through 9), A cannot extend its previous cluster contiguously. As a result, A’s sixth ID is assigned a session-local ID -6 (negative) at creation. The runtime sends another creation range for this one new ID in a subsequent op.

  • Step 7: When that op is sequenced, A’s compressor finalizes the new range. Since A’s previous cluster (0–4) was not the last in the global ordering (B had 5–9), the compressor allocates a new cluster for A. This second cluster might start at final ID 10 (the next available after 9). The cluster’s capacity could again be, say, 5 (covering 10–14). Now A’s ID -6 is finalized to 10. With that cluster in place, if A generates more IDs (7th, 8th, etc.), up to four of them can be assigned final IDs 11, 12, 13, 14 immediately without going local.

This example shows how session A’s cluster chain consists of two clusters: one spanning final IDs 0–4 and another spanning 10–14. Session B has its own cluster (5–9). Each session’s local IDs start at -1 for their first cluster, and continue negatively (A used -1 through -5 for the first 5 IDs, then -6 for the next cluster’s first ID, and so on). In all cases, the baseLocalId of the first cluster is -1 (by design of finalizeCreationRange, which will error if it isn’t). The mapping within each cluster is straightforward linear offset, and no two sessions’ final ID ranges overlap. By using negative IDs, the system allowed each session to act independently when it ran out of reserved space, and by using generous cluster reservations and expansions, it minimized how often those independent actions are needed.

API Overview and Usage

The ID Compressor provides methods to generate IDs and to convert between the various representations (session space, op space, stable IDs):

  • generateCompressedId(): Creates a new stable ID (UUID) and returns its compressed form in session space. The returned SessionSpaceCompressedId is either a negative number (if no final ID was immediately available) or a non-negative final ID (if allocated from an existing cluster) as described above. In either case, the compressor internally generates a new UUID for it. This call is synchronous and always succeeds in producing a unique ID for the session. The stable UUID can be retrieved at any time via decompress(). Generation is O(1) in time.

  • takeNextCreationRange(): Gathers all IDs generated since the last range was taken, packaging them into an IdCreationRange object. This includes the session’s ID (sessionId) and, if any IDs were created, a description of the first ID’s generation index, how many IDs (count), the requested cluster size, and the spans of those IDs that were session-local. The Fluid runtime uses this method internally when preparing an outbound operation that includes newly created IDs. Application developers do not need to call this directly – it is handled by the framework when submitting ops. The ranges must be sent in the order they are taken; calling this will also increment an internal marker to enforce that ordering.

  • finalizeCreationRange(range: IdCreationRange): Incorporates a range of IDs (which could be from a remote session or the local session) into the compressor’s state as finalized. This call will allocate or update clusters for the specified session. If it’s the first time that sessionId is seen, a new cluster chain is started (and the base local ID of that first cluster must align to -1 as checked internally). If the session already has an existing cluster chain, the compressor will attempt to expand the last cluster if possible or create a new cluster. Specifically, if the range being finalized picks up exactly where the session’s last cluster left off and that last cluster is also the highest final ID in the entire system, the compressor will extend the cluster’s capacity to accommodate the new IDs (and possibly some additional reserved capacity). This is the cluster expansion optimization: by enlarging the last cluster when safe, the compressor avoids fragmenting the ID space and allows the session to continue with a single contiguous range, reducing the need for future local IDs. If the last cluster cannot be expanded (e.g., another session’s cluster comes after it globally), a new cluster is allocated for this session’s new range. After finalization, the mapping of those session-local IDs to final IDs is fixed and consistent across all clients. This method runs in O(1) time.

  • decompress(id: SessionSpaceCompressedId): StableId: Converts a compressed ID (in session space) back into the 128-bit stable UUID string. This works for any compressed ID that the compressor knows about – whether it’s a local negative ID or a final ID. If the ID is session-local and negative (not final), the compressor still knows the stable UUID because it was generated at creation time, and will return it by computing a simple offset from the base session UUID. If the ID is final, it likewise returns the stable UUID by finding the containing cluster and computing the offset. Decompression is O(log max(n, m)) where n is the number of clusters in final space and m is the number of "runs" of local IDs in the local session (contiguous negative IDs generated with no eager finals). The former is the time when decompressing a final space ID, and the latter for a negative ID (it would be O(1) but for error checking querying the normalizer).

  • recompress(uncompressed: StableId): SessionSpaceCompressedId and tryRecompress(uncompressed: StableId): SessionSpaceCompressedId | undefined: These APIs perform the reverse of decompression – given a stable UUID, they attempt to find its compressed ID representation for this compressor’s session space. If the UUID was originally generated by this compressor (or is otherwise recognized in the compressor’s cluster records), this will return the corresponding session-space ID. For example, if you pass in a stable ID that this session created, recompress will return the same negative ID (or final ID) that was originally issued for it. If you pass a stable ID created by another session that has since been finalized, it will return the final ID (since in this session’s space, that is the only form of that ID). If the stable ID is not known to the compressor at all (meaning it wasn’t generated by any session in the document’s history that this compressor is aware of), tryRecompress will return undefined and recompress will throw an error. This situation simply indicates the UUID was never introduced via this compression scheme – for instance, it might be a completely external ID or from a different document.

  • normalizeToOpSpace(id: SessionSpaceCompressedId): OpSpaceCompressedId: Converts a session-space ID into op space, for use in an outgoing op or other serialized form. If the given ID is from the local session and has a final ID (either it was eagerly finalized or has since been finalized), this will yield that final ID. If it’s from the local session and still unfinalized (a negative ID with no assigned final yet), it will remain as a negative but will be tagged as an OpSpaceCompressedId type (essentially the same numeric value). If the ID passed in is actually already a final ID (including any non-local session-space ID), it simply returns it unchanged (already normalized). This method will throw an error if the ID is invalid or unknown to the compressor (which should not happen if the ID came from a valid source).

  • normalizeToSessionSpace(id: OpSpaceCompressedId, originSessionId: SessionId): SessionSpaceCompressedId: Converts an op-space ID plus its originating session into this compressor’s session space form. This is used when receiving an op or reading persisted data. The originSessionId tells the compressor whose context the ID was created in. The logic is as follows: If the id is a final ID (non-negative), the compressor checks if it falls into any cluster of the local session. If it belongs to this local session’s own cluster chain (meaning the ID was actually created by this session and finalized), the compressor will see if that ID had a local form. If it did (normalizer knows it was created as local), it returns the local negative ID. If it didn’t (it was an eagerly assigned final for this session), it will just return the same final ID as a SessionSpaceCompressedId. If the final ID does not belong to the local session’s clusters, the compressor checks that it is within the range of allocated finals in the entire document (to ensure it’s not beyond any known ID). If it’s known and belongs to some other session, it returns the ID as-is (since from this session’s perspective, that final ID is the only form of the ID; there is no local negative for it in this compressor). On the other hand, if the id is a negative number (meaning the op carried a session-local ID that wasn’t finalized at the time of send), then if the originSessionId matches this compressor’s session, it simply verifies the ID is one of our own (we generated it) and returns it. If the originSessionId is some other session, the compressor looks up that session’s cluster chain and converts the remote local ID to its final ID if possible. If the remote local ID isn’t yet finalized in our state, it means we haven’t processed the creation op for it yet (this can happen if ops were received out of order), and normalizeToSessionSpace will throw an error for an “unknown op space ID”. In practice, the Fluid runtime ensures that creation ranges are processed before any ops that reference those IDs, so such an out-of-order scenario is typically avoided.

  • serialize(withSession: boolean): Exports the compressor’s state to a binary/blittable format for storage. If withSession is true, it includes the current session’s unfinalized state (any locally generated IDs not yet finalized, and the session’s ability to continue generating sequential UUIDs). This is used for offline storage (e.g., snapshotting a local client’s state) so that the exact sequence can resume on reload. If withSession is false, it serializes only the finalized state (all known clusters and sessions, but no info about ongoing local generation). This is suitable for persisting in summaries that go to the service, as it does not tie the summary to a particular client’s in-progress state. On deserialization, a compressor can be reconstructed from the serialized data. Typically the Fluid container’s runtime handles serialization/deserialization of the ID Compressor as part of summarizing or loading a document. Application developers don’t usually call this directly, but maintainers should ensure any changes to compressor internals maintain compatibility with the serialized format (versioned via an internal version number).

Performance and Optimizations

The ID Compressor is designed for high performance in the collaborative runtime:

  • Generation Throughput: Creating new IDs (generateCompressedId) is an O(1) operation – it just increments a counter and does a couple of checks for cluster availability. There is no contention or waiting; even if an ID ends up being session-local, it’s generated instantly without needing consensus.

  • Compression/Decompression: Converting between stable IDs and compressed IDs relies on searching through a session’s clusters or local-ID ranges. Because clusters are stored in sorted order and typically few in number, and because local-ID ranges are stored in a compact run-length format, these conversions are very fast. The worst-case complexity is O(log n), where n is the number of clusters in final space.

  • Memory Efficiency: Each cluster is stored as a tuple of four integers plus the session reference: the base final ID, the capacity, the current count, and the base local ID (the local ID associated with the first final ID in the cluster). This means even a very large number of IDs can be represented compactly. For example, 10,000 IDs allocated by one session with default cluster sizes would result in at most 20 cluster entries (depending on how interleaved other sessions’ ops were, there may be substantially fewer). The SessionSpaceNormalizer (which tracks which IDs were created as local) stores run-length encoded ranges of generation indices, also using very little memory (two numbers per contiguous local run). The compressor avoids storing full UUIDs for every ID by exploiting the sequential generation: it can store a single base UUID and generate others by offset math.

  • Cluster Reservation and Expansion: The default behavior is to request a reasonably large cluster on the first allocation (e.g., 512 IDs by default) to amortize the cost of coordination. If a session only generates a few IDs, the cost is just one small cluster. If it generates many, having a large cluster allows most IDs to be finalized eagerly. The cluster expansion optimization further improves efficiency: when a session is actively producing IDs and remains at the forefront of the global ID space (its cluster is last in the global ordering), finalizing a new range will simply extend that cluster’s capacity rather than start a new one. This means the session can continue using final IDs without interruption. Only when another session “overtakes” by allocating IDs beyond the end of your cluster would you need a new cluster and thus incur another local ID. In the best case (one session generating all IDs or generating the most IDs), that one session might get away with using a single expanding cluster for a very long time. In typical multi-session scenarios, each session will have a modest number of clusters that correlates with how often its ID generation interleaves with others. Overall, this strategy ensures near-optimal use of the numeric space and minimizes the overhead of temporary IDs.

  • Session Space Normalization Cost: The SessionSpaceNormalizer structure helps determine if a given ID index was allocated as local or final. It performs containment checks with binary search in a sorted list of ranges, which is O(log m) where m is the number of local-ID ranges in the session. In practice, m is the number of times the session had to fall back to local ID generation – essentially the count of cluster boundaries. With large cluster sizes and expansions, m is expected to remain very small (often one or just a few) for most sessions. Thus, checking or normalizing an ID’s form is effectively constant-time in common cases. The normalizer is only updated when a new local ID is generated (simple append/merge at the end of the range list, O(1)).

Appendix: Session Space Normalizer (Detailed)

The SessionSpaceNormalizer is an internal helper that records which IDs a session created in local form. Its purpose is to answer the question: “Given the n-th ID generated by this session, was it originally created as a local (negative) ID or as a final (positive) ID?” This is crucial for the compressor to correctly normalize IDs between session space and op space. For example, if the local session generated an ID and gave it a final number immediately (eager final), then when converting that ID to session space later, we should leave it as final (since there was never a negative form). Conversely, if the ID was created as a negative, we should represent it as such in session space even after it has a final ID.

How it works: The normalizer maintains a sorted list of ranges (intervals) of generation indices that were allocated as local IDs. A generation index is essentially a counter (starting at 1 for the first ID generated, 2 for the second, and so on). When the compressor generates a new ID and determines it must be a local ID, the normalizer’s addLocalRange() is called to record that index (or extend the latest range). If multiple local IDs are created in succession, they form a contiguous range of generation indices. If a final ID is created, that index is simply not recorded as local. Thus, the normalizer ends up with something like: “IDs 1-2 were local, 3-5 were final, 6 was local, 7-9 final, 10 was local, etc.” internally.

For example, referring to the earlier scenario for session A:

  • Generation indices 1 and 2 (IDs -1 and -2) were local, so the normalizer holds range [1, count=2].
  • Indices 3,4,5 were final (eager, cluster had space), so they are not in any local range.
  • Index 6 was local (when cluster1 exhausted), so a new range [6, count=1] is added.
  • Indices 7,8,9 were final (cluster2 space), so not in local ranges.
  • Index 10 was local (cluster2 exhausted, awaiting cluster3), so a new range [10, count=1].

The normalizer would thus store something like: {(1,2), (6,1), (10,1)} as the local ranges (using base index and count). From this, one can deduce any index’s form.

Concretely, if asked “was the 5th ID local?”, we see 5 is not in any recorded range, meaning it was not allocated as local (it was final). If asked about 6th, 6 falls in the [6,1] range, so yes, that was local. This information is used when normalizing to session space: e.g., if we have a final ID that corresponds to generation 5 of our session, the compressor checks normalizer – seeing 5 was not a local allocation, it knows the ID had no negative form, so it stays as final in session space. If it was generation 6, normalizer says that was local, so in our session space we prefer the local representation (the negative number).

Data structure: SessionSpaceNormalizer uses an append-only sorted map of ranges keyed by the starting generation index. It merges consecutive ranges for efficiency (if two local IDs are created back-to-back, it merges them into one range). Querying if an ID was local is a matter of finding the range at or immediately before that index and checking if the index falls within it – a binary search operation. It can also retrieve all local ranges within a given span of indices, which is used when forming the localIdRanges field for IdCreationRange (to inform other clients exactly which IDs in the range were local). These operations are fast due to the small number of ranges and binary search.

Performance: As noted, the normalizer’s operations are O(log m) for queries, where m is the number of disjoint local ranges. In practice m is very small. Every time the compressor switches from generating final IDs to a local ID (i.e., when a cluster runs out), a new range might be added. But with large clusters and cluster expansion, this might happen rarely. Even in worst-case scenarios with many interleavings, the number of ranges will be far less than the number of IDs. Memory usage is minimal (two integers per range). The normalizer does not impose any significant overhead and does not need to be reset or pruned during normal operation – once an ID is marked as local, that fact remains relevant for the life of the compressor.

In summary, the SessionSpaceNormalizer ensures that the temporal aspect of ID creation (whether an ID had to be a placeholder or not) is remembered. This guarantees that the compressor can consistently interpret and present IDs in the correct form (local vs final) to the application and for serialization, maintaining the invariants of ID stability and equivalence across the system.

FAQs

Package last updated on 04 Aug 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.