pointer-set
A general-purpose pointer implementation using an automatically
expanding set of UintArray blocks, for use in building linked
lists and other sorts of data structures.
Some minimal protections are in place if using this module with
TypeScript, but it will not prevent you from memory leaks,
dangling pointers, and other perils of manual memory management.
Use with caution!
USAGE
import { PointerSet } from '@isaacs/pointer-set'
const { PointerSet } = require('@isaacs/pointer-set')
const fields = ['next', 'prev'] as const
const store = new PointerSet<string, typeof fields, []>(fields, 256)
import { PointerSetInferFields } from '@isaacs/pointer-set'
const FieldsInferred = PointerSetInferFields(fields)
const storeInferred = new FieldsInferred<string>()
const store16 = new PointerSet<string, typeof fields, []>(fields, 65536)
const zro = store.alloc('zro')
const one = store.alloc('one')
store.ref(zro, 'next', one)
store.ref(one, 'prev', zro)
const two = store.ref(one, 'next', store.alloc('two'))
store.ref(two, 'prev', one)
const tre = store.alloc('tre', { prev: two })
store.ref(two, 'next', tre)
const fur = store.ref(tre, 'next', store.alloc('fur', { prev: one }))
store.free(tre)
store.ref(two, 'next', fur)
store.ref(fur, 'prev', two)
for (let p = zro; p; p = store.ref(p, 'next')) {
console.log(store.value(p))
}
for (let p = fur; p; p = store.ref(p, 'prev')) {
if (p === tre) {
store.value(p, 'ert')
}
console.log(store.value(p))
}
See the scripts in ./examples
for more.
PointerSetInferFields(fields: readonly string[], rawFields?: readonly string[]) => class
Because TypeScript generics are all-or-nothing, we cannot do
this:
new PointerSet<string>(['field'] as const, blockSize)
// ^-- Expected 2-3 type arguments, but got 1. (tsserver 2558)
and would instead have to do:
const fields = ['field'] as const
new PointerSet<string, typeof fields, []>(fields, blockSize)
specifying all of the types in the generic, which can get
rather verbose.
To work around this, the PointerSetInfer
method takes a const
array of field names, and a const array of raw field names, and
returns a class where those are fixed.
const fields = ['field'] as const
const MyClass = PointerSetInfer(fields)
const store = new MyClass<string>()
The other option, of course, is to leave the value type
unspecified as well. Then the field and rawField names can be
inferred, but the value will be left as any
, so there won't be
any type checking of the data passed to store.value(pointer, data)
method.
nullPointer: 0 as Pointer
For convenience, a reference to the null pointer is exported.
It's just the number 0
cast to the Pointer
type.
type Pointer
A pointer is just a number in the unsigned 32 bit range (ie,
between 0
and 2**32 - 1
). The exported type is branded with
a unique symbol.
For safety, all methods that expect to get a Pointer
will use
this branded type, to prevent you from accentally using a pointer
reference that was not provided by this library. However, for
cases where you may need to cast to the type, it's exported.
Class PointerSet<T, K extends readonly string[], R extends readonly string[] = []>
This is the class that represents an expanding data store of
entries, where each entry is a value of type T
, and a set of
pointers to other entries for each of the field names in K
.
The type T
can be anything except undefined
. (If you aren't
storing JavaScript values, null
is a good option.)
For notes on using type inferrence to set K
and R
types, see
the section above regarding the PointerSetInferFields
method.
store = new PointerSet<T, K, R>(fieldNames: K, blockSize: number = 256, rawFieldNamess?: R)
Create a new PointerSet to store entries with a value of T
, and
the internal references named by fieldNames
.
Any names in rawFieldNames
will be marked as unsafe for pointer
dereferencing. They may be used to store arbitrary uint32
values, which is more efficient than filling up a number[]
,
especially if multiple numbers are needed which would require a
number[][]
.
fieldNames
and rawFieldNames
When instantiating, fieldNames
(and optionally rawFieldNames
)
should be an inline array of string literals or a string array
marked with as const
or readonly
, so that TypeScript can
properly infer the list of types via static analysis.
For example:
const treeStore = new PointerSet(['prev', 'next', 'children'], 256)
const fields = ['prev', 'next', 'children'] as const
const treeStore = new PointerSet(fields, 256)
const fields: readonly string[] = ['prev', 'next', 'children']
const treeStore = new PointerSet(fields, 256)
class MyTree<T> {
store: PointerSet<T, ['prev', 'next', 'children'], []>
constructor() {
this.store = new PointerSet<T, ['prev', 'next', 'children'], []>()
}
}
const fields = ['prev', 'next', 'children'] as const
class MyTree<T> {
store: PointerSet<T, typeof fields, []>
constructor() {
this.store = new PointerSet<T, fields>()
}
}
const fields: readonly string[] = ['prev', 'next', 'children']
class MyTree<T> {
store: PointerSet<T, typeof fields, []>
constructor() {
this.store = new PointerSet<T, fields>()
}
}
// not as good, doesn't type check or autocomplete field names
const fields = ['prev', 'next', 'children']
const treeStore = new PointerSet(fields, 256)
blockSize
The default blockSize value is 256, and affects the allocation
and storage behavior. In general, this module is tuned more for
speed than for memory efficiency, so it's wise to take the
trade-offs into account.
blockSize
can be any number between 1 and 65536, and sets the
number of entries that can be stored in a single block.
blockSize
Tuning
tl;dr
- Leave the default for a good trade-off between allocation
speed, with the caveat that heap expansion allocations can
occur at run time more frequently (at every multiple of 256
items).
- Setting a larger value will mean fewer slower allocations.
- 256 and 65536 result in the maximum possible hard capacity
limit.
- Setting any smaller value does not cause PointerSet to
consume less memory (it'll be either 13 or 14 bytes per entry,
regardless), but it does change the manner in which it is
consumed.
If set to 256 or smaller, then each pointer will use a single
byte for the index within the block, and 3 bytes for the blockId.
If set to a number greater than 256 and less than 65535, then 2
bytes will be used for item index, and 2 bytes for block id.
If set to either 256 or 65536, then 2**32
items can be
stored, allocating either in blocks of 256 or 65536,
respectively. In the first case, up to 16777216 blocks can be
allocated, each holding 256 items. In the second, up to 65536
blocks can be allocated, each storing 65536 items. Both result
in 2**32
items as an upper bound.
If set to some number other than 256 or 65536, then the total
allocation capacity will be limited to less than 2**32
, because
it'll still use the same number of bytes for item index and block
id, so this is something to keep in mind.
An internal freeList
stack of free indexes is allocated as
well. When the blockSize
is 256 or less, the freeList
is a
Uint8Array
of blockSize
items. When the blockSize
is
greater than 256
, the freeList
is a Uint16Array
of
blockSize
items.
The storage capacity overhead for each block is thus:
blockSize * 4 + fields.length * blockSize * 4 + freeListSize
So, for new PointerSet<T>(['two', 'fields'], 256)
, that works
out to 256 * 4 + 2 * 256 * 4 + 256
, or 3328 bytes per block,
with a hard capacity limit of 4294967296 items, stored across
16777216 blocks, for a total allocation overhead of 52GiB,
averaging 13 bytes allocated per item at this limit.
For new Pointer<T>(['two', 'fields'], 65536)
, that is 65536 * 4 + 2 * 65536 * 4 + 65536 * 2
, or 896kb per block. At full
capacity of 65536 blocks, this results in 56GiB, or 14 bytes per
item. Each block allocation will take significantly longer, but
they will happen 1/256 as often, making it a reasonable trade
off in many cases.
If the blockSize is set to 32768
, then the block count will
still be limited to 65536, but each block will contain 1/2 as
many times, for a total capacity limit of 2,147,483,648 items
(that is, half as many as with a blockSize of 256 or 65536).
In the case of a PointerSet with two fields, each block will be
32768 * 4 + 2 * 32768 * 4 + 32768 * 2
or 448KiB per block. At
full capacity, this results in 14GiB of allocation overhead, or
14 bytes per item.
This scheme ensures that all reference lookups between blocks
are O(1)
, because there is never any need to scan the list of
blocks. It also prevents excessive garbage collection of
JavaScript objects, because there aren't many; each pointer is
just a number.
When to use rawFields
If the value that you would be storing can be expressed as one or
more unsigned 32-bit integer values, then you can get more
efficient and performant storage by specifying them as
rawFields
rather than setting them in the value type.
This is handy if you want to store a string and some other
arbitrary numbers, because managing an array of a large number of
small objects can be very slow due to garbage collection
overhead.
You should not use this if your value property would otherwise be
JavaScript numbers, since those are highly optimized, and a
JavaScript array of integers is slightly more performant in
many cases than a Uint32Array.
However, if it saves putting objects into the values array (for
example, if you are storing an object that is just 3 integers),
then it can be worthwhile.
For example:
interface Point {
x: number
y: number
z: number
}
const fields = ['prev', 'next'] as const
const store = new PointerSet<Point, typeof fields, []>(fields)
const p = store.alloc(new Point(1,3,7))
const x = store.value(p).x
const y = store.value(p).y
const z = store.value(p).z
Since we know the number is going to be between 0 and 2**32
, we
can get a better result by doing it this way:
const fields = ['prev', 'next'] as const
const raw = ['x', 'y', 'z'] as const
const store = new PointerSet<null, typeof fields, typeof raw>(
fields,
256,
raw
)
const p = store.alloc(null)
const x = store.raw(p, 'x')
const y = store.raw(p, 'y')
const z = store.raw(p, 'z')
Tradeoffs:
- Using raw values increases the allocation size of each block.
However, it does not increase overall memory usage in the
full block state as much as using almost any JavaScript value
type would. It's just pre-allocating rather than allocating it
on demand.
- raw values are limited to integers in the range from
0
to
2**32-1
, and there is no type checking to prevent overflow. - raw values avoids JavaScript garbage collection costs incurred
by having to track and clean up the entries in the block's
values
array. This becomes relevant for complex value types;
it's rarely beneficial if you would only be storing a single
number anyway.
store.size(): number
The total number of entries stored in all blocks in the
PointerSet.
store.totalAvailable(): number
The total number of available entry slots in all blocks in the
PointerSet. Note that the storage will expand on demand, so this
is actually just the number of entries that can be set before
another allocation will occur.
store.entryCount(blockId: number): number
The number of entries stored in a given block.
store.available(blockId: number): number
The number of available entry slots in a given block.
store.alloc(value: T, refs?: {[k: string]: Pointer}) => Pointer
Allocate a new memory location, and set the stored value to
value
. If refs
is provided, then each of the fields in refs
will be set to the provided pointer value.
store.free(pointer: Pointer) => void
Remove the pointer's value from the store, and mark its memory
location as available for reuse.
Note that this does not unset any entries that reference this
pointer, or erase any data from any of the pointer fields. It
only drops the value and marks the index as available for
reuse.
store.erase(pointer: Pointer) => void
Calls store.free(pointer)
and also sets all of its field
references to null.
Note that this does not unset any entries that reference this
pointer.
store.ref(pointer: Pointer, field: string) => Pointer
Get the value of the specified field. If 0
is returned (a null
pointer) then the field is unset.
store.ref(pointer: Pointer, field: string, target: Pointer) => Pointer
Set the value of the field on the referenced pointer to the
target. Returns the target pointer.
store.refAll(pointer: Pointer }) => refs
Get a JavaScript object containing all references from the
pointer provided. Returned object has a key with each of the
fieldNames
provided, where the value is a Pointer
(or the
nullPointer
if not yet set).
store.refAll(pointer: Pointer, refs: { [k: string]: Pointer }) => refs
Iterate over the keys in refs
, calling store.ref(pointer, key, refs[key])
.
Returns the refs
object provided.
store.value(pointer: Pointer) => T | undefined
Returns the value stored for the supplied pointer, or undefined
if no value provided.
Always returns undefined
for a null pointer.
store.value(pointer: Pointer, value: T) => T
Set the value store for the pointer to the supplied value.
Returns the supplied value.
store.raw(pointer: Pointer, field: FieldName<R>): number
Specify one of the names provided in the rawFields
list, and
get the number value between 0
and 2**32
stored at the
apporpriate address.
store.raw(pointer: Pointer, field: FieldName<R>, val: number): number
Specify one of the names provided in the rawFields
list, and
set the number value between 0
and 2**32
stored at the
apporpriate address to the provided value.
store.rawAll(pointer: Pointer) => raws
Get all the raw values as a JavaScript object, where the keys are
the field names and the values are the numbers stored in the
slab.
store.rawAll(pointer: Pointer, raws: { [k: string]: number }) => raws
Set zero or more raw values. Returns the same values object
provided.
store.wipeBlock() => void
Erases all entries on a given block.
store.dropEmpty() => void
Drop any empty blocks from the end of the set.
store.drop() => void
Internal Method
Remove a block from the set.
Throws if the block has entries, or if any subsequent blocks
exist. When called on the root block, calls store.wipeBlock()
instead.