Structurae
A collection of data structures for high-performance JavaScript applications
that includes:
- Binary Protocol - simple binary protocol based on
DataView and defined with JSONSchema
- Bit Structures:
- BitField & BigBitField - stores and operates on
data in Numbers and BigInts treating them as bitfields.
- BitArray - an array of bits implemented with Uint32Array.
- Pool - manages availability of objects in object pools.
- RankedBitArray - extends BitArray with O(1) time rank and
O(logN) select methods.
- Graphs:
- Adjacency Structures - implement adjacency list &
matrix data structures.
- Graph - extends an adjacency list/matrix structure and provides
methods for traversal (BFS, DFS), pathfinding (Dijkstra, Bellman-Ford),
spanning tree construction (BFS, Prim), etc.
- Grids:
- BinaryGrid - creates a grid or 2D matrix of bits.
- Grid - extends built-in indexed collections to handle 2 dimensional
data (e.g. nested arrays).
- SymmetricGrid - a grid to handle symmetric or triangular
matrices using half the space required for a normal grid.
- Sorted Structures:
- BinaryHeap - extends Array to implement the Binary Heap data
structure.
- SortedArray - extends Array to handle sorted data.
Usage
Node.js:
npm i structurae
import {...} from "structurae";
Deno:
import {...} from "https://deno.land/x/structurae@4.0.0/index.ts"
Documentation
Overview
Binary Protocol
Binary data in JavaScript is represented by ArrayBuffer and accessed through
TypedArrays and DataView. However, both of those interfaces are limited to
working with numbers. Structurae offers a set of classes that extend the
DataView interface to support using ArrayBuffers for strings, objects, and
arrays. These classes ("views") form the basis for a simple binary protocol with
the following features:
- smaller and faster than schema-less binary formats (e.g. BSON, MessagePack);
- supports zero-copy operations, e.g. reading and changing object fields without
decoding the whole object;
- supports static typing through TypeScript;
- uses JSON Schema for schema definitions;
- does not require compilation unlike most other schema-based formats (e.g.
FlatBuffers).
The protocol is operated through the View
class that handles creation and
caching of necessary structures for a given JSON Schema as well as simplifying
serialization of tagged objects.
import { View } from "structurae";
const view = new View();
interface Animal {
name: string;
age: number;
}
const AnimalView = view.create<Animal>({
$id: "Pet",
type: "object",
properties: {
name: { type: "string", maxLength: 10 },
age: { type: "number", btype: "uint8" },
},
});
const animal = AnimalView.from({ name: "Gaspode", age: 10 });
animal instanceof DataView;
animal.byteLength;
animal.get("age");
animal.set("age", 20);
animal.toJSON();
Objects and Maps
Objects by default are treated as C-like structs, the data is laid out
sequentially with fixed sizes, all standard JavaScript values are supported,
inluding other objects and arrays of fixed size:
interface Friend {
name: string;
}
interface Person {
name: string;
fullName: Array<string>;
bestFriend: Friend;
friends: Array<Friend>;
}
const PersonView = view.create<Person>({
$id: "Person",
type: "object",
properties: {
name: { type: "string", maxLength: 10 },
fullName: {
type: "array",
maxItems: 2,
items: { type: "string", maxLength: 20 },
},
bestFriend: { $ref: "#Friend" },
friends: {
type: "array",
maxItems: 3,
items: {
$id: "Friend",
type: "object",
properties: {
name: { type: "string", maxLength: 20 },
},
},
},
},
});
const person = Person.from({
name: "Carrot",
fullName: ["Carrot", "Ironfoundersson"],
bestFriend: { name: "Sam Vimes" },
friends: [{ name: "Sam Vimes" }],
});
person.get("name");
person.getView("name");
person.get("fullName");
person.toJSON();
Objects that support optional fields and fields of variable size ("maps") should
additionally specify btype
map
and list non-optional (fixed sized) fields as
required
:
interface Town {
name: string;
railstation: boolean;
clacks?: number;
}
const TownView = view.create<Town>({
$id: "Town",
type: "object",
btype: "map",
properties: {
name: { type: "string" },
railstation: { type: "boolean" },
clacks: { type: "integer" },
}
required: ["railstation"],
});
const lancre = TownView.from({ name: "Lancre", railstation: false });
lancre.get("name")
lancre.get("clacks")
lancre.byteLength
const stoLat = TownView.from({ name: "Sto Lat", railstation: true, clacks: 1 });
stoLat.get("clacks")
stoLat.byteLength
The size and layout of each map instance is calculated upon creation and stored
within the instance (unlike fixed sized objects, where each instance have the
same size and layout). Maps are useful for densely packing objects and arrays
whose size my vary greatly. There is a limitation, though, since ArrayBuffers
cannot be resized, optional fields that were absent upon creation of a map view
cannot be set later, and those set cannot be resized, that is, assigned a value
that is greater than their current size.
For performance sake, all variable size views are encoded using single global
ArrayBuffer that is 8192 bytes long, if you expect to handle bigger views,
supply a bigger DataView when instantiating a view protocol:
import { View } from "structurae";
const view = new View(new DataView(new ArrayBuffer(65536)));
There are certain requirements for a JSON Schema used for fixed sized objects:
- Each object should have a unique id defined with
$id
field. Upon
initialization, the view class is stored in View.Views
and accessed with the
id used as the key. References made with $ref
are also resolved against the
id. - For fixed sized objects, sizes of strings and arrays should be defined using
maxLength
and maxItems
properties respectfully. $ref
can be used to reference objects by their $id
. The referenced object
should be defined either in the same schema or in a schema initialized
previously.- Type
number
by default resolves to float64
and type integer
to int32
,
you can use any other type by specifying it in btype
property.
Objects and maps support setting default values of required fields. Default
values are applied upon creation of a view:
interface House {
size: number;
}
const House = view.create<House>({
$id: "House",
type: "object",
properties: {
size: { type: "integer", btype: "uint32", default: 100 },
},
});
const house = House.from({} as House);
house.get("size");
Default values of an object can be overridden when it is nested inside another
object:
interface Neighborhood {
house: House;
biggerHouse: House;
}
const Neighborhood = view.create<Neighborhood>({
$id: "Neighborhood",
type: "object",
properties: {
house: { $ref: "#House" },
biggerHouse: { $ref: "#House", default: { size: 200 } },
},
});
const neighborhood = Neighborhood.from({} as Neighborhood);
neighborhood.get("house");
neighborhood.get("biggerHouse");
Dictionaries
Objects and maps described above assume that all properties of encoded objects
are known and defined beforehand, however, if the properties are not known, and
we are dealing with an object used as a lookup table (also called map, hash map,
or records in TypeScript) with varying amount of properties and known type of
values, we can use a dictionary view:
const NumberDict = view.create<Record<number, string | undefined>>({
$id: "NumberDict",
type: "object",
btype: "dict",
propertyNames: { type: "number", btype: "uint8" },
additionalProperties: { type: "string" },
});
const dict = NumberDict.from({ 1: "a", 2: "bcd", 3: undefined });
dict.get(1);
dict.get(3);
dict.get(10);
dict.get(2);
Arrays and Vectors
The protocol supports arrays of non-nullable fixed sized values (numbers,
strings of fixed maximum size, objects) and vectors--arrays with nullable or
variable sized values. The type of items held by both "container" views is
defined in items
field of the schema.
const Street = view.create<Array<House>>({
type: "array",
items: {
type: "object",
$ref: "#House",
},
});
const street = Street.from([{ size: 10 }, { size: 20 }]);
street.byteLength;
street.get(0);
street.getView(0).get("size");
street.size;
street.set(0, { size: 100 });
street.get(0);
For vectors set btype
to vector
:
const Names = view.create<Array<string | undefined>>({
type: "array",
btype: "vector",
items: {
type: "string",
},
});
const witches = Names.from([
"Granny Weatherwax",
"Nanny Ogg",
undefined,
"Magrat Garlick",
]);
witches.byteLength;
witches.get(0);
witches.get(2);
As with maps, the layout of vectors is calculated upon creation and editing is
limited to the items present upon creation.
Strings
The protocol handles strings through StringView, an extension of DataView that
handles string serialization. It also offers a handful of convenience methods to
operate on encoded strings so that some common operations can be performed
without decoding the string:
import { StringView } from "structurae";
let stringView = StringView.from("abc😀a");
stringView.toString();
stringView == "abc😀a";
stringView = StringView.from("abc😀");
stringView.length;
stringView.size;
stringView.charAt(0);
stringView.charAt(3);
[...stringView.characters()];
stringView.substring(0, 4);
stringView = StringView.from("abc😀a");
const searchValue = StringView.from("😀");
stringView.indexOf(searchValue);
const replacement = StringView.from("d");
stringView.replace(searchValue, replacement).toString();
stringView.reverse().toString();
Tagged Objects
When transferring our buffers encoded with views we can often rely on meta
information to know what kind of view to use in order to decode a received
buffer. However, sometimes we might want our views to carry that information
within themselves. To do that, we can prepend or tag each view with a value
indicating its class, i.e. add a field that defaults to a certain value for each
view class. Now upon receiving a buffer we can read that field using the
DataView and convert it into an appropriate view.
The View
class offers a few convenience methods to simplify this process:
import { View } from "structurae";
interface Dog {
tag: 0;
name: string;
}
interface Cat {
tag: 1;
name: string;
}
const DogView = view.create<Dog>({
type: "object",
$id: "Dog",
properties: {
tag: { type: "number", btype: "uint8", default: 0 }
name: { type: "string", maxLength: 10 }
},
});
const CatView = view.create<Cat>({
type: "object",
$id: "Cat",
properties: {
tag: { type: "number", btype: "uint8", default: 1 }
name: { type: "string", maxLength: 10 }
},
});
const animal = view.encode({ tag: 0, name: "Gaspode" });
view.decode(animal)
Extending View Types
The view protocol is designed with extensibility in mind. While built-in view
types are ample for most cases, creating a special type can reduce boilerplate
in certain situations. You can check out a full example of creating and using a
custom view type for BitArray in
examples/bit-array-view.
To create a new view type, first create a class extending DataView and
implementing one of the view type interfaces, for example PrimitiveView
:
export class BitArrayView extends DataView implements PrimitiveView<BitArray> {
...
}
To let TypeScript know about our new type, we use
module augmentation
to add our new type name to ViewSchemaTypeMap
interface:
declare module "structurae" {
interface ViewSchemaTypeMap {
bitarray: "string";
}
}
This way, it will be a binary subtype (or btype
) of JSONSchema type string
.
And finally, we add the new class to the list of views used by our protocol
instance:
const protocol = new View();
protocol.Views.set("bitarray", BitArrayView);
Now we can use the new type in our schemas, for example:
class UserSettings {
id = 0;
settings = new BitArray(3);
}
const UserSettingsView = protocol.create<UserSettings>({
$id: "UserSettings",
type: "object",
properties: {
id: { type: "integer" },
settings: {
type: "string",
btype: "bitarray",
maxLength: 12,
},
},
}, UserSettings);
Bit Structures
BitField & BigBitField
BitField and BigBitField use JavaScript Numbers and BigInts respectively as
bitfields to store and operate on data using bitwise operations. By default,
BitField operates on 31 bit long bitfield where bits are indexed from least
significant to most:
import { BitField } from "structurae";
const bitfield = new BitField(29);
bitfield.get(0);
bitfield.get(1);
bitfield.has(2, 3, 4);
You can use BitFieldMixin or BigBitFieldMixin with your own schema by specifying
field names and their respective sizes in bits:
const Field = BitFieldMixin({ width: 8, height: 8 });
const field = new Field({ width: 100, height: 200 });
field.get("width");
field.get("height");
field.set("width", 18);
field.get("width");
field.toObject();
If the total size of your fields exceeds 31 bits, use BigBitFieldMixin that
internally uses a BigInt to represent the resulting number, however, you can
still use normal numbers to set each field and get their value as a number as
well:
const LargeField = BitFieldMixin({ width: 20, height: 20 });
const largeField = new LargeField([1048576, 1048576]);
largeField.value;
largeField.set("width", 1000).get("width");
If you have to add more fields to your schema later on, you do not have to
re-encode your existing values, just add new fields at the end of your new
schema:
const OldField = BitFieldMixin({ width: 8, height: 8 });
const oldField = OldField.encode([20, 1]);
const NewField = BitFieldMixin({ width: 8, height: 8, area: 10 });
const newField = new NewField(oldField);
newField.get("width");
newField.get("height");
newField.set("weight", 100).get("weight");
If you only want to encode or decode a set of field values without creating an
instance, you can do so by using static methods BitField.encode
and
BitField.decode
respectively:
const Field = BitFieldMixin({ width: 7, height: 1 });
Field.encode([20, 1]);
Field.encode({ height: 1, width: 20 });
Field.decode(41);
If you don't know beforehand how many bits you need for your field, you can call
BitField.getMinSize
with the maximum possible value of your field to find out:
BitField.getMinSize(100);
const Field = BitFieldMixin({ width: BitField.getMinSize(250), height: 8 });
For performance sake, BitField doesn't check the size of values being set and
setting values that exceed the specified field size will lead to undefined
behavior. If you want to check whether values fit their respective fields, you
can use BitField.isValid
:
const Field = BitFieldMixin({ width: 7, height: 1 });
Field.isValid({ width: 100 });
Field.isValid({ width: 100, height: 3 });
BitField#match
(and its static variation BitField.match
) can be used to
check values of multiple fields at once:
const Field = BitFieldMixin({ width: 7, height: 1 });
const field = new Field([20, 1]);
field.match({ width: 20 });
field.match({ height: 1, width: 20 });
field.match({ height: 1, width: 19 });
Field.match(field.valueOf(), { height: 1, width: 20 });
If you have to check multiple BitField instances for the same values, create a
special matcher with BitField.getMatcher
and use it in the match method, that
way each check will require only one bitwise operation and a comparison:
const Field = BitFieldMixin({ width: 7, height: 1 });
const matcher = Field.getMatcher({ height: 1, width: 20 });
Field.match(new Field([20, 1]).valueOf(), matcher);
Field.match(new Field([19, 1]).valueOf(), matcher);
BitArray
BitArray uses Uint32Array as an array or vector of bits. It's a simpler version
of BitField that only sets and checks individual bits:
const array = new BitArray(10);
array.getBit(0);
array.setBit(0).getBit(0);
array.size;
array.length;
BitArray is the base class for
Pool and
RankedBitArray classes.
It's useful in cases where one needs more bits than can be stored in a number,
but doesn't want to use BigInts as it is done by
BitField.
Pool
Implements a fast algorithm to manage availability of objects in an object pool
using a BitArray.
const { Pool } = require("structurae");
const pool = new Pool(100 * 16);
pool.get();
pool.get();
pool.free(0);
pool.get();
pool.get();
RankedBitArray
RankedBitArray is an extension of BitArray with methods to efficiently calculate
rank and select. The rank is calculated in constant time where as select has
O(logN) time complexity. This is often used as a basic element in implementing
succinct data structures.
const array = new RankedBitArray(10);
array.setBit(1).setBit(3).setBit(7);
array.rank(2);
array.rank(7);
array.select(2);
Graphs
Structurae offers classes that implement adjacency list (AdjacencyList
) and
adjacency matrix (AdjacencyMatrixUnweightedDirected
,
AdjacencyMatrixUnweightedUndirected
, AdjacencyMatrixWeightedDirected
,
AdjacencyMatrixWeightedUnirected
) as basic primitives to represent graphs
using TypedArrays, and the Graph
class that extends the adjacency structures
to offer methods for traversing graphs (BFS, DFS), pathfinding (Dijkstra,
Bellman-Ford), and spanning tree construction (BFS, Prim).
Adjacency Structures
AdjacencyList
implements adjacency list data structure extending a TypedArray
class. The adjacency list requires less storage space: number of vertices +
number of edges * 2 (for a weighted list). However, adding and removing edges is
much slower since it involves shifting/unshifting values in the underlying typed
array.
import { AdjacencyListMixin } from "structurae";
const List = AdjacencyListMixin(Int32Array);
const graph = List.create(6, 6);
graph.length;
graph.addEdge(0, 1, 5);
graph.addEdge(0, 2, 1);
graph.addEdge(2, 4, 1);
graph.addEdge(2, 5, 2);
graph.hasEdge(0, 1);
graph.hasEdge(0, 4);
graph.outEdges(2);
graph.inEdges(2);
graph.hasEdge(0, 1);
graph.getEdge(0, 1);
Since the maximum amount of egdes in AdjacencyList is limited to the number
specified at creation, adding edges can overflow throwing a RangeError. If
that's a possibility, use AdjacencyList#isFull
method to check if the limit is
reached before adding an edge.
AdjacencyMatrixUnweightedDirected
and AdjacencyMatrixUnweightedUndirected
implement adjacency matrix data structure for unweighted graphs representing
each edge by a single bit in an underlying ArrayBuffer.
AdjacencyMatrixWeightedDirected
and AdjacencyMatrixWeightedUnirected
implement adjacency matrix for weighted graphs extending a given TypedArray to
store the weights akin to Grid.
import {
AdjacencyMatrixUnweightedDirected,
AdjacencyMatrixWeightedDirectedMixin,
} from "structurae";
const Matrix = AdjacencyMatrixWeightedDirectedMixin(Int32Array);
const unweightedMatrix = new AdjacencyMatrixUnweightedDirected.create(6);
unweightedMatrix.addEdge(0, 1);
unweightedMatrix.addEdge(0, 2);
unweightedMatrix.addEdge(0, 3);
unweightedMatrix.addEdge(2, 4);
unweightedMatrix.addEdge(2, 5);
unweightedMatrix.hasEdge(0, 1);
unweightedMatrix.hasEdge(0, 4);
unweightedMatrix.outEdges(2);
unweightedMatrix.inEdges(2);
const weightedMatrix = Matrix.create(6);
weightedMatrix.addEdge(0, 1, 3);
weightedMatrix.hasEdge(0, 1);
weightedMatrix.hasEdge(1, 0);
weightedMatrix.getEdge(1, 0);
Graph
Graph
extends a provided adjacency structure with methods for traversing,
pathfinding, and spanning tree construction that use various graph algorithms.
import {
AdjacencyMatrixUnweightedDirected,
AdjacencyMatrixWeightedDirectedMixin,
GraphMixin,
} from "structurae";
const UnweightedGraph = GraphMixin(AdjacencyMatrixUnweightedDirected);
const WeightedGraph = GraphMixin(
AdjacencyMatrixWeightedDirectedMixin(Int32Array),
);
The traversal is done by a generator function Graph#traverse
that can be
configured to use Breadth-First or Depth-First traversal, as well as returning
vertices on various stages of processing, i.e. only return vertices that are
fully processed (black
), or being processed (gray
), or just encountered
(white
):
const graph = WeightedGraph.create(6);
graph.addEdge(0, 1, 3);
graph.addEdge(0, 2, 2);
graph.addEdge(0, 3, 1);
graph.addEdge(2, 4, 8);
graph.addEdge(2, 5, 6);
[...graph.traverse()];
[...graph.traverse(true)];
[...graph.traverse(false, 0, false, true)];
Graph#path
returns the list of vertices constituting the shortest path between
two given vertices. By default, the class uses BFS based search for unweighted
graphs, and Bellman-Ford algorithm for weighted graphs. However, the method can
be configured to use other algorithms by specifying arguments of the function:
graph.path(0, 5);
graph.path(0, 5, true);
graph.path(0, 5, false, true);
Grids
BinaryGrid
BinaryGrid creates a grid or 2D matrix of bits and provides methods to operate
on it:
import { BinaryGrid } from "structurae";
const bitGrid = BinaryGrid.create(2, 8);
bitGrid.setValue(0, 0).setValue(0, 2).setValue(0, 5);
bitGrid.getValue(0, 0);
bitGrid.getValue(0, 1);
bitGrid.getValue(0, 2);
BinaryGrid packs bits into numbers like
BitField and holds them in an
ArrayBuffer, thus occupying the smallest possible space.
Grid
Grid extends a provided Array or TypedArray class to efficiently handle 2
dimensional data without creating nested arrays. Grid "unrolls" nested arrays
into a single array and pads its "columns" to the nearest power of 2 in order to
employ quick lookups with bitwise operations.
import { GridMixin } from "structurae";
const ArrayGrid = GridMixin(Array);
const grid = ArrayGrid.create(5, 4);
grid.length;
grid[0];
const dataGrid = new ArrayGrid([
1,
2,
3,
4,
5,
6,
7,
8,
]);
dataGrid.columns = 4;
dataGrid.getValue(1, 0);
dataGrid.columns = 2;
dataGrid.getValue(1, 0);
You can get and set elements using their row and column indexes:
grid.getValue(0, 1);
grid.setValue(0, 1, 10);
grid.getValue(0, 1);
grid.getIndex(0, 1);
grid.getCoordinates(0);
grid.getCoordinates(1);
A grid can be turned to and from an array of nested arrays using respectively
Grid.fromArrays
and Grid#toArrays
methods:
const grid = ArrayGrid.fromArrays([[1, 2], [3, 4]]);
grid.getValue(1, 1);
const grid = ArrayGrid.fromArrays([[1, 2], [3, 4, 5]]);
grid.getValue(1, 1);
grid.toArrays();
grid.toArrays(true);
SymmetricGrid
SymmetricGrid is a Grid that offers a more compact way of encoding symmetric or
triangular square matrices using half as much space.
import { SymmetricGrid } from "structurae";
const grid = ArrayGrid.create(100, 100);
grid.length;
const symmetricGrid = SymmetricGrid.create(100);
symmetricGrid.length;
Since the grid is symmetric, it returns the same value for a given pair of
coordinates regardless of their position:
symmetricGrid.setValue(0, 5, 10);
symmetricGrid.getValue(0, 5);
symmetricGrid.getValue(5, 0);
Sorted Structures
BinaryHeap
BinaryHeap extends built-in Array to implement the binary heap data structure.
All the mutating methods (push, shift, splice, etc.) do so while maintaining the
valid heap structure. By default, BinaryHeap implements min-heap, but it can be
changed by providing a different comparator function:
import { BinaryHeap } from "structurae";
class MaxHeap extends BinaryHeap {}
MaxHeap.compare = (a, b) => a > b;
In addition to all array methods, BinaryHeap provides a few methods to traverse
or change the heap:
const heap = new BinaryHeap(10, 1, 20, 3, 9, 8);
heap[0];
heap.left(0);
heap.right(0);
heap.parent(1);
heap.replace(4);
heap[0];
heap[0] = 6;
heap.update(0);
SortedArray
SortedArray extends Array with methods to efficiently handle sorted data. The
methods that change the contents of an array do so while preserving the sorted
order:
import { SortedArray } from "structurae";
const sortedArray = new SortedArray();
sortedArray.push(1);
sortedArray.unshift(8);
sortedArray.splice(0, 2, 6);
uniquify
can be used to remove duplicating elements from the array:
const a = SortedArray.from([1, 1, 2, 2, 3, 4]);
a.uniquify();
If the instance property unique
of an array is set to true
, the array will
behave as a set and avoid duplicating elements:
const a = new SortedArray();
a.unique = true;
a.push(1);
a.push(2);
a.push(1);
a;
To create a sorted collection from unsorted array-like objects or items use
from
and of
static methods respectively:
SortedArray.from([3, 2, 9, 5, 4]);
SortedArray.of(8, 5, 6);
new SortedArray
behaves the same way as new Array
and should be used with
already sorted elements:
new SortedArray(...[1, 2, 3, 4, 8]);
indexOf
and includes
use binary search that increasingly outperforms the
built-in methods as the size of the array grows.
SortedArray provides isSorted
method to check if the collection is sorted, and
range
method to get elements of the collection whose values are between the
specified range:
sortedArray.range(3, 5);
sortedArray.range(undefined, 4);
sortedArray.range(4);
SortedArray also provides a set of functions to perform common set operations
and find statistics of any sorted array-like objects.
License
MIT © Maga D. Zandaqo