@solana/codecs-data-structures
This package contains codecs for various data structures such as arrays, maps, structs, tuples, enums, etc. It can be used standalone, but it is also exported as part of the Solana JavaScript SDK @solana/web3.js@rc
.
This package is also part of the @solana/codecs
package which acts as an entry point for all codec packages as well as for their documentation.
Array codec
The getArrayCodec
function accepts any codec of type T
and returns a codec of type Array<T>
. For instance, here’s how we can create a codec for arrays of numbers that each fit in a single byte.
const bytes = getArrayCodec(getU8Codec()).encode([1, 2, 3]);
const array = getArrayCodec(getU8Codec()).decode(bytes);
By default, the size of the array is stored as a u32
prefix before encoding the items.
getArrayCodec(getU8Codec()).encode([1, 2, 3]);
However, you may use the size
option to configure this behaviour. It can be one of the following three strategies:
Codec<number>
: When a number codec is provided, that codec will be used to encode and decode the size prefix.number
: When a number is provided, the codec will expect a fixed number of items in the array. An error will be thrown when trying to encode an array of a different length."remainder"
: When the string "remainder"
is passed as a size, the codec will use the remainder of the bytes to encode/decode its items. This means the size is not stored or known in advance but simply inferred from the rest of the buffer. For instance, if we have an array of u16
numbers and 10 bytes remaining, we know there are 5 items in this array.
getArrayCodec(getU8Codec(), { size: getU16Codec() }).encode([1, 2, 3]);
getArrayCodec(getU8Codec(), { size: 3 }).encode([1, 2, 3]);
getArrayCodec(getU8Codec(), { size: 'remainder' }).encode([1, 2, 3]);
Separate getArrayEncoder
and getArrayDecoder
functions are also available.
const bytes = getArrayEncoder(getU8Encoder()).encode([1, 2, 3]);
const array = getArrayDecoder(getU8Decoder()).decode(bytes);
Set codec
The getSetCodec
function accepts any codec of type T
and returns a codec of type Set<T>
. For instance, here’s how we can create a codec for sets of numbers that each fit in a single byte.
const bytes = getSetCodec(getU8Codec()).encode(new Set([1, 2, 3]));
const set = getSetCodec(getU8Codec()).decode(bytes);
Just like the array codec, it uses a u32
size prefix by default but can be configured using the size
option. See the array codec for more details.
getSetCodec(getU8Codec(), { size: getU16Codec() }).encode(new Set([1, 2, 3]));
getSetCodec(getU8Codec(), { size: 3 }).encode(new Set([1, 2, 3]));
getSetCodec(getU8Codec(), { size: 'remainder' }).encode(new Set([1, 2, 3]));
Separate getSetEncoder
and getSetDecoder
functions are also available.
const bytes = getSetEncoder(getU8Encoder()).encode(new Set([1, 2, 3]));
const set = getSetDecoder(getU8Decoder()).decode(bytes);
Map codec
The getMapCodec
function accepts two codecs of type K
and V
and returns a codec of type Map<K, V>
. For instance, here’s how we can create a codec for maps such that the keys are fixed strings of 8 bytes and the values are u8
numbers.
const keyCodec = fixCodecSize(getUtf8Codec(), 8);
const valueCodec = getU8Codec();
const bytes = getMapCodec(keyCodec, valueCodec).encode(new Map([['alice', 42]]));
const map = getMapCodec(keyCodec, valueCodec).decode(bytes);
Just like the array codec, it uses a u32
size prefix by default.
const keyCodec = fixCodecSize(getUtf8Codec(), 8);
const valueCodec = getU8Codec();
const myMap = new Map<string, number>();
myMap.set('alice', 42);
myMap.set('bob', 5);
getMapCodec(keyCodec, valueCodec).encode(myMap);
However, it can be configured using the size
option. See the size
option of the array codec for more details.
getMapCodec(keyCodec, valueCodec, { size: getU16Codec() }).encode(myMap);
getMapCodec(keyCodec, valueCodec, { size: 3 }).encode(myMap);
getMapCodec(keyCodec, valueCodec, { size: 'remainder' }).encode(myMap);
Separate getMapEncoder
and getMapDecoder
functions are also available.
const bytes = getMapEncoder(keyEncoder, valueEncoder).encode(myMap);
const map = getMapDecoder(keyDecoder, valueDecoder).decode(bytes);
Tuple codec
The getTupleCodec
function accepts any number of codecs — T
, U
, V
, etc. — and returns a tuple codec of type [T, U, V, …]
such that each item is in the order of the provided codecs.
const codec = getTupleCodec([addCodecSizePrefix(getUtf8Codec(), getU32Codec()), getU8Codec(), getU64Codec()]);
const bytes = codec.encode(['alice', 42, 123]);
const tuple = codec.decode(bytes);
Separate getTupleEncoder
and getTupleDecoder
functions are also available.
const bytes = getTupleEncoder([getU8Encoder(), getU64Encoder()]).encode([42, 123]);
const tuple = getTupleDecoder([getU8Decoder(), getU64Decoder()]).decode(bytes);
Struct codec
The getStructCodec
function accepts any number of field codecs and returns a codec for an object containing all these fields. Each provided field is an array such that the first item is the name of the field and the second item is the codec used to encode and decode that field type.
type Person = { name: string; age: number };
const personCodec: Codec<Person> = getStructCodec([
['name', addCodecSizePrefix(getUtf8Codec(), getU32Codec())],
['age', getU8Codec()],
]);
const bytes = personCodec.encode({ name: 'alice', age: 42 });
const person = personCodec.decode(bytes);
Separate getStructEncoder
and getStructDecoder
functions are also available.
const personEncoder: Encoder<Person> = getStructEncoder([
['name', addEncoderSizePrefix(getUtf8Encoder(), getU32Encoder())],
['age', getU8Encoder()],
]);
const personDecoder: Decoder<Person> = getStructDecoder([
['name', addDecoderSizePrefix(getUtf8Decoder(), getU32Decoder())],
['age', getU8Decoder()],
]);
const bytes = personEncoder.encode({ name: 'alice', age: 42 });
const person = personDecoder.decode(bytes);
Enum codec
The getEnumCodec
function accepts a JavaScript enum constructor and returns a codec for encoding and decoding values of that enum.
enum Direction {
Left,
Right,
}
const bytes = getEnumCodec(Direction).encode(Direction.Left);
const direction = getEnumCodec(Direction).decode(bytes);
When encoding an enum, you may either provide the value of the enum variant — e.g. Direction.Left
— or its key — e.g. 'Left'
.
enum Direction {
Left,
Right,
}
getEnumCodec(Direction).encode(Direction.Left);
getEnumCodec(Direction).encode(Direction.Right);
getEnumCodec(Direction).encode('Left');
getEnumCodec(Direction).encode('Right');
As you can see, by default, a u8
number is being used to store the enum value. However, a number codec may be passed as the size
option to configure that behaviour.
const u32DirectionCodec = getEnumCodec(Direction, { size: getU32Codec() });
u32DirectionCodec.encode(Direction.Left);
u32DirectionCodec.encode(Direction.Right);
This function also works with lexical enums — e.g. enum Direction { Left = '←' }
— explicit numerical enums — e.g. enum Speed { Left = 50 }
— and hybrid enums with a mix of both.
enum Numbers {
One,
Five = 5,
Six,
Nine = 'nine',
}
getEnumCodec(Numbers).encode(Direction.One);
getEnumCodec(Numbers).encode(Direction.Five);
getEnumCodec(Numbers).encode(Direction.Six);
getEnumCodec(Numbers).encode(Direction.Nine);
getEnumCodec(Numbers).encode('One');
getEnumCodec(Numbers).encode('Five');
getEnumCodec(Numbers).encode('Six');
getEnumCodec(Numbers).encode('Nine');
Notice how, by default, the index of the enum variant is used to encode the value of the enum. For instance, in the example above, Numbers.Five
is encoded as 0x01
even though its value is 5
. This is also true for lexical enums.
However, when dealing with numerical enums that have explicit values, you may use the useValuesAsDiscriminators
option to encode the value of the enum variant instead of its index.
enum Numbers {
One,
Five = 5,
Six,
Nine = 9,
}
const codec = getEnumCodec(Numbers, { useValuesAsDiscriminators: true });
codec.encode(Direction.One);
codec.encode(Direction.Five);
codec.encode(Direction.Six);
codec.encode(Direction.Nine);
codec.encode('One');
codec.encode('Five');
codec.encode('Six');
codec.encode('Nine');
Note that when using the useValuesAsDiscriminators
option on an enum that contains a lexical value, an error will be thrown.
enum Lexical {
One,
Two = 'two',
}
getEnumCodec(Lexical, { useValuesAsDiscriminators: true });
Separate getEnumEncoder
and getEnumDecoder
functions are also available.
const bytes = getEnumEncoder(Direction).encode(Direction.Left);
const direction = getEnumDecoder(Direction).decode(bytes);
Literal union codec
The getLiteralUnionCodec
function works similarly to the getUnionCodec
function but does not require a JavaScript enum
to exist.
It accepts an array of literal values — such as string
, number
, boolean
, etc. — and returns a codec that encodes and decodes such values using by using their index in the array. It uses TypeScript unions to represent all the possible values.
const codec = getLiteralUnionCodec(['left', 'right', 'up', 'down']);
const bytes = codec.encode('left');
const value = codec.decode(bytes);
As you can see, it uses a u8
number by default to store the index of the value. However, you may provide a number codec as the size
option of the getLiteralUnionCodec
function to customise that behaviour.
const codec = getLiteralUnionCodec(['left', 'right', 'up', 'down'], {
size: getU32Codec(),
});
codec.encode('left');
codec.encode('right');
codec.encode('up');
codec.encode('down');
Separate getLiteralUnionEncoder
and getLiteralUnionDecoder
functions are also available.
const bytes = getLiteralUnionEncoder(['left', 'right']).encode('left');
const value = getLiteralUnionDecoder(['left', 'right']).decode(bytes);
Discriminated union codec
In Rust, enums are powerful data types whose variants can be one of the following:
- An empty variant — e.g.
enum Message { Quit }
. - A tuple variant — e.g.
enum Message { Write(String) }
. - A struct variant — e.g.
enum Message { Move { x: i32, y: i32 } }
.
Whilst we do not have such powerful enums in JavaScript, we can emulate them in TypeScript using a union of objects such that each object is differentiated by a specific field. We call this a discriminated union.
We use a special field named __kind
to distinguish between the different variants of a discriminated union. Additionally, since all variants are objects, we can use a fields
property to wrap the array of tuple variants. Here is an example.
type Message =
| { __kind: 'Quit' }
| { __kind: 'Write'; fields: [string] }
| { __kind: 'Move'; x: number; y: number };
The getDiscriminatedUnionCodec
function helps us encode and decode these discriminated unions.
It requires the discriminator and codec of each variant as a first argument. Similarly to the struct codec, these are defined as an array of variant tuples where the first item is the discriminator of the variant and the second item is its codec. Since empty variants do not have data to encode, they simply use the unit codec — documented below — which does nothing.
Here is how we can create a discriminated union codec for our previous example.
const messageCodec = getDiscriminatedUnionCodec([
['Quit', getUnitCodec()],
['Write', getStructCodec([['fields', getTupleCodec([addCodecSizePrefix(getUtf8Codec(), getU32Codec())])]])],
[
'Move',
getStructCodec([
['x', getI32Codec()],
['y', getI32Codec()],
]),
],
]);
And here’s how we can use such a codec to encode discriminated unions. Notice that by default, they use a u8
number prefix to distinguish between the different types of variants.
messageCodec.encode({ __kind: 'Quit' });
messageCodec.encode({ __kind: 'Write', fields: ['Hi'] });
messageCodec.encode({ __kind: 'Move', x: 5, y: 6 });
However, you may provide a number codec as the size
option of the getDiscriminatedUnionCodec
function to customise that behaviour.
const u32MessageCodec = getDiscriminatedUnionCodec([...], {
size: getU32Codec(),
});
u32MessageCodec.encode({ __kind: 'Quit' });
u32MessageCodec.encode({ __kind: 'Write', fields: ['Hi'] });
u32MessageCodec.encode({ __kind: 'Move', x: 5, y: 6 });
You may also customize the discriminator property — which defaults to __kind
— by providing the desired property name as the discriminator
option like so:
const messageCodec = getDiscriminatedUnionCodec([...], {
discriminator: 'message',
});
messageCodec.encode({ message: 'Quit' });
messageCodec.encode({ message: 'Write', fields: ['Hi'] });
messageCodec.encode({ message: 'Move', x: 5, y: 6 });
Note that, the discriminator value of a variant may be any scalar value — such as number
, bigint
, boolean
, a JavaScript enum
, etc. For instance, the following is also valid:
enum Message {
Quit,
Write,
Move,
}
const messageCodec = getDiscriminatedUnionCodec([
[Message.Quit, getUnitCodec()],
[Message.Write, getStructCodec([...])],
[Message.Move, getStructCodec([...])],
]);
codec.encode({ __kind: Message.Quit });
codec.encode({ __kind: Message.Write, fields: ['Hi'] });
codec.encode({ __kind: Message.Move, x: 5, y: 6 });
Finally, note that separate getDiscriminatedUnionEncoder
and getDiscriminatedUnionDecoder
functions are available.
const bytes = getDiscriminatedUnionEncoder(variantEncoders).encode({ __kind: 'Quit' });
const message = getDiscriminatedUnionDecoder(variantDecoders).decode(bytes);
Union codec
The getUnionCodec
is a lower-lever codec helper that can be used to encode/decode any TypeScript union.
It accepts the following arguments:
- An array of codecs, each defining a variant of the union.
- A
getIndexFromValue
function which, given a value of the union, returns the index of the codec that should be used to encode that value. - A
getIndexFromBytes
function which, given the byte array to decode at a given offset, returns the index of the codec that should be used to decode the next bytes.
const codec: Codec<number | boolean> = getUnionCodec(
[getU16Codec(), getBooleanCodec()],
value => (typeof value === 'number' ? 0 : 1),
(bytes, offset) => (bytes.slice(offset).length > 1 ? 0 : 1),
);
codec.encode(42);
codec.encode(true);
As usual, separate getUnionEncoder
and getUnionDecoder
functions are also available.
const bytes = getUnionEncoder(encoders, getIndexFromValue).encode(42);
const value = getUnionDecoder(decoders, getIndexFromBytes).decode(bytes);
Boolean codec
The getBooleanCodec
function returns a Codec<boolean>
that stores the boolean as 0
or 1
using a u8
number by default.
const bytes = getBooleanCodec().encode(true);
const value = getBooleanCodec().decode(bytes);
You may configure that behaviour by providing an explicit number codec as the size
option of the getBooleanCodec
function. That number codec will then be used to encode and decode the values 0
and 1
accordingly.
getBooleanCodec({ size: getU16Codec() }).encode(false);
getBooleanCodec({ size: getU16Codec() }).encode(true);
getBooleanCodec({ size: getU32Codec() }).encode(false);
getBooleanCodec({ size: getU32Codec() }).encode(true);
Separate getBooleanEncoder
and getBooleanDecoder
functions are also available.
const bytes = getBooleanEncoder().encode(true);
const value = getBooleanDecoder().decode(bytes);
Nullable codec
The getNullableCodec
function accepts a codec of type T
and returns a codec of type T | null
. It stores whether or not the item exists as a boolean prefix using a u8
by default.
const stringCodec = addCodecSizePrefix(getUtf8Codec(), getU32Codec());
getNullableCodec(stringCodec).encode('Hi');
getNullableCodec(stringCodec).encode(null);
You may provide a number codec as the prefix
option of the getNullableCodec
function to configure how to store the boolean prefix.
const u32NullableStringCodec = getNullableCodec(stringCodec, {
prefix: getU32Codec(),
});
u32NullableStringCodec.encode('Hi');
u32NullableStringCodec.encode(null);
Additionally, if the item is a FixedSizeCodec
, you may set the noneValue
option to "zeroes"
to also make the returned nullable codec a FixedSizeCodec
. To do so, it will pad null
values with zeroes to match the length of existing values.
const fixedNullableStringCodec = getNullableCodec(
fixCodecSize(getUtf8Codec(), 8),
{ noneValue: 'zeroes' },
);
fixedNullableStringCodec.encode('Hi');
fixedNullableStringCodec.encode(null);
The noneValue
option can also be set to an explicit byte array to use as the padding for null
values. Note that, in this case, the returned codec will not be a FixedSizeCodec
as the byte array representing null
values may be of any length.
const codec = getNullableCodec(getUtf8Codec(), {
noneValue: new Uint8Array([255]),
});
codec.encode('Hi');
codec.encode(null);
Last but not least, the prefix
option of the getNullableCodec
function can also be set to null
, meaning no prefix will be used to determine whether the item exists. In this case, the codec will rely on the noneValue
option to determine whether the item is null
.
const codecWithZeroNoneValue = getNullableCodec(getU16Codec(), {
noneValue: 'zeroes',
prefix: null,
});
codecWithZeroNoneValue.encode(42);
codecWithZeroNoneValue.encode(null);
const codecWithCustomNoneValue = getNullableCodec(getU16Codec(), {
noneValue: new Uint8Array([255]),
prefix: null,
});
codecWithCustomNoneValue.encode(42);
codecWithCustomNoneValue.encode(null);
Finally, note that if prefix
is set to null
and no noneValue
is provided, the codec assumes that the item exists if and only if some remaining bytes are available to decode. This could be useful to describe data structures that may or may not have additional data to the end of the buffer.
const codec = getNullableCodec(getU16Codec(), { prefix: null });
codec.encode(42);
codec.encode(null);
codec.decode(new Uint8Array([42, 0]));
codec.decode(new Uint8Array([]));
To recap, here are all the possible configurations of the getNullableCodec
function, using a u16
codec as an example.
encode(42) / encode(null) | No noneValue (default) | noneValue: "zeroes" | Custom noneValue (0xff ) |
---|
u8 prefix (default) | 0x012a00 / 0x00 | 0x012a00 / 0x000000 | 0x012a00 / 0x00ff |
Custom prefix (u16 ) | 0x01002a00 / 0x0000 | 0x01002a00 / 0x00000000 | 0x01002a00 / 0x0000ff |
No prefix | 0x2a00 / 0x | 0x2a00 / 0x0000 | 0x2a00 / 0xff |
Note that you might be interested in the Rust-like alternative version of nullable codecs, available in the @solana/options
package.
Separate getNullableEncoder
and getNullableDecoder
functions are also available.
const bytes = getNullableEncoder(getU32Encoder()).encode(42);
const value = getNullableDecoder(getU32Decoder()).decode(bytes);
Bytes codec
The getBytesCodec
function returns a Codec<Uint8Array>
meaning it converts Uint8Arrays
to and from… Uint8Arrays
! Whilst this might seem a bit useless, it can be useful when composed into other codecs. For example, you could use it in a struct codec to say that a particular field should be left unserialised.
const bytes = getBytesCodec().encode(new Uint8Array([42]));
const value = getBytesCodec().decode(bytes);
The getBytesCodec
function will encode and decode Uint8Arrays
using as much bytes as necessary. If you'd like to restrict the number of bytes used by this codec, you may combine it with the fixCodecSize
or addCodecSizePrefix
primitives.
Here are some examples of how you might use the getBytesCodec
function.
getBytesCodec().encode(new Uint8Array([42]));
addCodecSizePrefix(getBytesCodec(), getU16Codec()).encode(new Uint8Array([42]));
fixCodecSize(getBytesCodec(), 5).encode(new Uint8Array([42]));
Separate getBytesEncoder
and getBytesDecoder
functions are also available.
const bytes = getBytesEncoder().encode(new Uint8Array([42]));
const value = getBytesDecoder().decode(bytes);
Bit array codec
The getBitArrayCodec
function returns a codec that encodes and decodes an array of booleans such that each boolean is represented by a single bit. It requires the size of the codec in bytes and an optional backward
flag that can be used to reverse the order of the bits.
const booleans = [true, false, true, false, true, false, true, false];
getBitArrayCodec(1).encode(booleans);
getBitArrayCodec(1, { backward: true }).encode(booleans);
Separate getBitArrayEncoder
and getBitArrayDecoder
functions are also available.
const bytes = getBitArrayEncoder(1).encode(booleans);
const decodedBooleans = getBitArrayDecoder(1).decode(bytes);
Constant codec
The getConstantCodec
function accepts any Uint8Array
and returns a Codec<void>
. When encoding, it will set the provided Uint8Array
as-is. When decoding, it will assert that the next bytes contain the provided Uint8Array
and move the offset forward.
const codec = getConstantCodec(new Uint8Array([1, 2, 3]));
codec.encode(undefined);
codec.decode(new Uint8Array([1, 2, 3]));
codec.decode(new Uint8Array([1, 2, 4]));
Separate getConstantEncoder
and getConstantDecoder
functions are also available.
getConstantEncoder(new Uint8Array([1, 2, 3])).encode(undefined);
getConstantDecoder(new Uint8Array([1, 2, 3])).decode(new Uint8Array([1, 2, 3]));
Unit codec
The getUnitCodec
function returns a Codec<void>
that encodes undefined
into an empty Uint8Array
and returns undefined
without consuming any bytes when decoding. This is more of a low-level codec that can be used internally by other codecs. For instance, this is how discriminated union codecs describe the codecs of empty variants.
getUnitCodec().encode(undefined);
getUnitCodec().decode(anyBytes);
Separate getUnitEncoder
and getUnitDecoder
functions are also available.
getUnitEncoder().encode(undefined);
getUnitDecoder().decode(anyBytes);
To read more about the available codecs and how to use them, check out the documentation of the main @solana/codecs
package.
Hidden prefix and suffix codec
The getHiddenPrefixCodec
and getHiddenSuffixCodec
functions allow us to respectively prepend or append a list of hidden Codec<void>
to a given codec. When encoding, the hidden codecs will be encoded before or after the main codec and the offset will be moved accordingly. When decoding, the hidden codecs will be decoded but only the result of the main codec will be returned. This is particularly helpful when creating data structures that include constant values that should not be included in the final type.
const codec: Codec<number> = getHiddenPrefixCodec(getU16Codec(), [
getConstantCodec(new Uint8Array([1, 2, 3])),
getConstantCodec(new Uint8Array([4, 5, 6])),
]);
codec.encode(42);
codec.decode(new Uint8Array([1, 2, 3, 4, 5, 6, 42, 0]));
As usual, separate encoder and decoder functions are also available.
getHiddenPrefixEncoder(encoder, prefixedEncoders);
getHiddenPrefixEncoder(decoder, prefixedDecoders);
getHiddenSuffixEncoder(encoder, suffixedEncoders);
getHiddenSuffixEncoder(decoder, suffixedDecoders);