Security News
pnpm 10.0.0 Blocks Lifecycle Scripts by Default
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
@solana/codecs-core
Advanced tools
Core types and helpers for encoding and decoding byte arrays on Solana
This package contains the core types and functions for encoding and decoding data structures on Solana. It can be used standalone, but it is also exported as part of the Solana JavaScript SDK @solana/web3.js@experimental
.
This package is also part of the @solana/codecs
package which acts as an entry point for all codec packages as well as for their documentation.
The easiest way to create your own codecs is to compose the various codecs offered by this library. For instance, here’s how you would define a codec for a Person
object that contains a name
string attribute and an age
number stored in 4 bytes.
type Person = { name: string; age: number };
const getPersonCodec = (): Codec<Person> =>
getStructCodec([
['name', getStringCodec()],
['age', getU32Codec()],
]);
This function returns a Codec
object which contains both an encode
and decode
function that can be used to convert a Person
type to and from a Uint8Array
.
const personCodec = getPersonCodec();
const bytes = personCodec.encode({ name: 'John', age: 42 });
const person = personCodec.decode(bytes);
There is a significant library of composable codecs at your disposal, enabling you to compose complex types. You may be interested in the documentation of these other packages to learn more about them:
@solana/codecs-numbers
for number codecs.@solana/codecs-strings
for string codecs.@solana/codecs-data-structures
for many data structure codecs such as objects, arrays, tuples, sets, maps, scalar enums, data enums, booleans, etc.@solana/options
for a Rust-like Option
type and associated codec.You may also be interested in some of the helpers of this @solana/codecs-core
library such as mapCodec
, fixCodec
or reverseCodec
that create new codecs from existing ones.
Note that all of these libraries are included in the @solana/codecs
package as well as the main @solana/web3.js
package for your convenience.
Whilst Codecs can both encode and decode, it is possible to only focus on encoding or decoding data, enabling the unused logic to be tree-shaken. For instance, here’s our previous example using Encoders only to encode a Person
type.
const getPersonEncoder = (): Encoder<Person> =>
getStructEncoder([
['name', getStringEncoder()],
['age', getU32Encoder()],
]);
const bytes = getPersonEncoder().encode({ name: 'John', age: 42 });
The same can be done for decoding the Person
type by using Decoders like so.
const getPersonDecoder = (): Decoder<Person> =>
getStructDecoder([
['name', getStringDecoder()],
['age', getU32Decoder()],
]);
const person = getPersonDecoder().decode(bytes);
Separating Codecs into Encoders and Decoders is particularly good practice for library maintainers as it allows their users to tree-shake any of the encoders and/or decoders they don’t need. However, we may still want to offer a codec helper for users who need both for convenience.
That’s why this library offers a combineCodec
helper that creates a Codec
instance from a matching Encoder
and Decoder
.
const getPersonCodec = (): Codec<Person> => combineCodec(getPersonEncoder(), getPersonDecoder());
This means library maintainers can offer Encoders, Decoders and Codecs for all their types whilst staying efficient and tree-shakeable. In summary, we recommend the following pattern when creating codecs for library types.
type MyType = /* ... */;
const getMyTypeEncoder = (): Encoder<MyType> => { /* ... */ };
const getMyTypeDecoder = (): Decoder<MyType> => { /* ... */ };
const getMyTypeCodec = (): Codec<MyType> =>
combineCodec(getMyTypeEncoder(), getMyTypeDecoder());
When creating codecs, the encoded type is allowed to be looser than the decoded type. A good example of that is the u64 number codec:
const u64Codec: Codec<number | bigint, bigint> = getU64Codec();
As you can see, the first type parameter is looser since it accepts numbers or big integers, whereas the second type parameter only accepts big integers. That’s because when encoding a u64 number, you may provide either a bigint
or a number
for convenience. However, when you decode a u64 number, you will always get a bigint
because not all u64 values can fit in a JavaScript number
type.
const bytes = u64Codec.encode(42);
const value = u64Codec.decode(bytes); // BigInt(42)
This relationship between the type we encode “From” and decode “To” can be generalized in TypeScript as To extends From
.
Here’s another example using an object with default values. You can read more about the mapEncoder
helper below.
type Person = { name: string, age: number };
type PersonInput = { name: string, age?: number };
const getPersonEncoder = (): Encoder<PersonInput> =>
mapEncoder(
getStructEncoder([
['name', getStringEncoder()],
['age', getU32Encoder()],
]),
input => { ...input, age: input.age ?? 42 }
);
const getPersonDecoder = (): Decoder<Person> =>
getStructEncoder([
['name', getStringEncoder()],
['age', getU32Encoder()],
]);
const getPersonCodec = (): Codec<PersonInput, Person> =>
combineCodec(getPersonEncoder(), getPersonDecoder())
It is also worth noting that Codecs can either be of fixed size or variable size.
FixedSizeCodecs
have a fixedSize
number attribute that tells us exactly how big their encoded data is in bytes.
const myCodec: FixedSizeCodec<number> = getU32Codec();
myCodec.fixedSize; // 4 bytes.
On the other hand, VariableSizeCodecs
do not know the size of their encoded data in advance. Instead, they will grab that information either from the provided encoded data or from the value to encode. For the former, we can simply access the length of the Uint8Array
. For the latter, it provides a getSizeFromValue
that tells us the encoded byte size of the provided value.
const myCodec: VariableSizeCodec<string> = getStringCodec({
size: getU32Codec(),
});
myCodec.getSizeFromValue('hello world'); // 4 + 11 bytes.
Also note that, if the VariableSizeCodec
is bounded by a maximum size, it can be provided as a maxSize
number attribute.
The following type guards are available to identify and/or assert the size of codecs: isFixedSize
, isVariableSize
, assertIsFixedSize
and assertIsVariableSize
.
Finally, note that the same is true for Encoders
and Decoders
.
FixedSizeEncoder
has a fixedSize
number attribute.VariableSizeEncoder
has a getSizeFromValue
function and an optional maxSize
number attribute.FixedSizeDecoder
has a fixedSize
number attribute.VariableSizeDecoder
has an optional maxSize
number attribute.If composing codecs isn’t enough for you, you may implement your own codec logic by using the createCodec
function. This function requires an object with a read
and a write
function telling us how to read from and write to an existing byte array.
The read
function accepts the bytes
to decode from and the offset
at each we should start reading. It returns an array with two items:
createCodec({
read(bytes, offset) {
const value = bytes[offset];
return [value, offset + 1];
},
// ...
});
Reciprocally, the write
function accepts the value
to encode, the array of bytes
to write the encoded value to and the offset
at which it should be written. It should encode the given value, insert it in the byte array, and provide the next offset to write to as the return value.
createCodec({
write(value, bytes, offset) {
bytes.set(value, offset);
return offset + 1;
},
// ...
});
Additionally, we must specify the size of the codec. If we are defining a FixedSizeCodec
, we must simply provide the fixedSize
number attribute. For VariableSizeCodecs
, we must provide the getSizeFromValue
function as described in the previous section.
// FixedSizeCodec.
createCodec({
fixedSize: 1,
// ...
});
// VariableSizeCodec.
createCodec({
getSizeFromValue: (value: string) => value.length,
// ...
});
Here’s a concrete example of a custom codec that encodes any unsigned integer in a single byte. Since a single byte can only store integers from 0 to 255, if any other integer is provided it will take its modulo 256 to ensure it fits in a single byte. Because it always requires a single byte, that codec is a FixedSizeCodec
of size 1
.
const getModuloU8Codec = () =>
createCodec<number>({
fixedSize: 1,
read(bytes, offset) {
const value = bytes[offset];
return [value, offset + 1];
},
write(value, bytes, offset) {
bytes.set(value % 256, offset);
return offset + 1;
},
});
Note that, it is also possible to create custom encoders and decoders separately by using the createEncoder
and createDecoder
functions respectively and then use the combineCodec
function on them just like we were doing with composed codecs.
This approach is recommended to library maintainers as it allows their users to tree-shake any of the encoders and/or decoders they don’t need.
Here’s our previous modulo u8 example but split into separate Encoder
, Decoder
and Codec
instances.
const getModuloU8Encoder = () =>
createEncoder<number>({
fixedSize: 1,
write(value, bytes, offset) {
bytes.set(value % 256, offset);
return offset + 1;
},
});
const getModuloU8Decoder = () =>
createDecoder<number>({
fixedSize: 1,
read(bytes, offset) {
const value = bytes[offset];
return [value, offset + 1];
},
});
const getModuloU8Codec = () => combineCodec(getModuloU8Encoder(), getModuloU8Decoder());
Here’s another example returning a VariableSizeCodec
. This one transforms a simple string composed of characters from a
to z
to a buffer of numbers from 1
to 26
where 0
bytes are spaces.
const alphabet = ' abcdefghijklmnopqrstuvwxyz';
const getCipherEncoder = () =>
createEncoder<string>({
getSizeFromValue: value => value.length,
write(value, bytes, offset) {
const bytesToAdd = [...value].map(char => alphabet.indexOf(char));
bytes.set(bytesToAdd, offset);
return offset + bytesToAdd.length;
},
});
const getCipherDecoder = () =>
createDecoder<string>({
read(bytes, offset) {
const value = [...bytes.slice(offset)].map(byte => alphabet.charAt(byte)).join('');
return [value, bytes.length];
},
});
const getCipherCodec = () => combineCodec(getCipherEncoder(), getCipherDecoder());
It is possible to transform a Codec<T>
to a Codec<U>
by providing two mapping functions: one that goes from T
to U
and one that does the opposite.
For instance, here’s how you would map a u32
integer into a string
representation of that number.
const getStringU32Codec = () =>
mapCodec(
getU32Codec(),
(integerAsString: string): number => parseInt(integerAsString),
(integer: number): string => integer.toString(),
);
getStringU32Codec().encode('42'); // new Uint8Array([42])
getStringU32Codec().decode(new Uint8Array([42])); // "42"
If a Codec
has different From and To types, say Codec<OldFrom, OldTo>
, and we want to map it to Codec<NewFrom, NewTo>
, we must provide functions that map from NewFrom
to OldFrom
and from OldTo
to NewTo
.
To illustrate that, let’s take our previous getStringU32Codec
example but make it use a getU64Codec
codec instead as it returns a Codec<number | bigint, bigint>
. Additionally, let’s make it so our getStringU64Codec
function returns a Codec<number | string, string>
so that it also accepts numbers when encoding values. Here’s what our mapping functions look like:
const getStringU64Codec = () =>
mapCodec(
getU64Codec(),
(integerInput: number | string): number | bigint =>
typeof integerInput === 'string' ? BigInt(integerAsString) : integerInput,
(integer: bigint): string => integer.toString(),
);
Note that the second function that maps the decoded type is optional. That means, you can omit it to simply update or loosen the type to encode whilst keeping the decoded type the same.
This is particularly useful to provide default values to object structures. For instance, here’s how we can map our Person
codec to give a default value to its age
attribute.
type Person = { name: string; age: number; }
const getPersonCodec = (): Codec<Person> => { /*...*/ }
type PersonInput = { name: string; age?: number; }
const getPersonWithDefaultValueCodec = (): Codec<PersonInput, Person> =>
mapCodec(
getPersonCodec(),
(person: PersonInput): Person => { ...person, age: person.age ?? 42 }
)
Similar helpers exist to map Encoder
and Decoder
instances allowing you to separate your codec logic into tree-shakeable functions. Here’s our getStringU32Codec
written that way.
const getStringU32Encoder = () =>
mapEncoder(getU32Encoder(), (integerAsString: string): number => parseInt(integerAsString));
const getStringU32Decoder = () => mapDecoder(getU32Decoder(), (integer: number): string => integer.toString());
const getStringU32Codec = () => combineCodec(getStringU32Encoder(), getStringU32Decoder());
The fixCodec
function allows you to bind the size of a given codec to the given fixed size.
For instance, say you want to represent a base-58 string that uses exactly 32 bytes when decoded. Here’s how you can use the fixCodec
helper to achieve that.
const get32BytesBase58Codec = () => fixCodec(getBase58Codec(), 32);
You may also use the fixEncoder
and fixDecoder
functions to separate your codec logic like so:
const get32BytesBase58Encoder = () => fixEncoder(getBase58Encoder(), 32);
const get32BytesBase58Decoder = () => fixDecoder(getBase58Decoder(), 32);
const get32BytesBase58Codec = () => combineCodec(get32BytesBase58Encoder(), get32BytesBase58Codec());
The resizeCodec
helper re-defines the size of a given codec by accepting a function that takes the current size of the codec and returns a new size. This works for both fixed-size and variable-size codecs.
// Fixed-size codec.
const getBiggerU32Codec = () => resizeCodec(getU32Codec(), size => size + 4);
getBiggerU32Codec().encode(42);
// 0x2a00000000000000
// | └-- Empty buffer space caused by the resizeCodec function.
// └-- Our encoded u32 number.
// Variable-size codec.
const getBiggerStringCodec = () => resizeCodec(getStringCodec(), size => size + 4);
getBiggerStringCodec().encode('ABC');
// 0x0300000041424300000000
// | └-- Empty buffer space caused by the resizeCodec function.
// └-- Our encoded string with a 4-byte size prefix.
Note that the resizeCodec
function doesn't change any encoded or decoded bytes, it merely tells the encode
and decode
functions how big the Uint8Array
should be before delegating to their respective write
and read
functions. In fact, this is completely bypassed when using the write
and read
functions directly. For instance:
const getBiggerU32Codec = () => resizeCodec(getU32Codec(), size => size + 4);
// Using the encode function.
getBiggerU32Codec().encode(42);
// 0x2a00000000000000
// Using the lower-level write function.
const myCustomBytes = new Uint8Array(4);
getBiggerU32Codec().write(42, myCustomBytes, 0);
// 0x2a000000
So when would it make sense to use the resizeCodec
function? This function is particularly useful when combined with the offsetCodec
function described below. Whilst the offsetCodec
may help us push the offset forward — e.g. to skip some padding — it won't change the size of the encoded data which means the last bytes will be truncated by how much we pushed the offset forward. The resizeCodec
function can be used to fix that. For instance, here's how we can use the resizeCodec
and the offsetCodec
functions together to create a struct codec that includes some padding.
const personCodec = getStructCodec([
['name', getStringCodec({ size: 8 })],
// There is a 4-byte padding between name and age.
[
'age',
offsetCodec(
resizeCodec(getU32Codec(), size => size + 4),
{ preOffset: ({ preOffset }) => preOffset + 4 },
),
],
]);
personCodec.encode({ name: 'Alice', age: 42 });
// 0x416c696365000000000000002a000000
// | | └-- Our encoded u32 (42).
// | └-- The 4-bytes of padding we are skipping.
// └-- Our 8-byte encoded string ("Alice").
As usual, the resizeEncoder
and resizeDecoder
functions can also be used to achieve that.
const getBiggerU32Encoder = () => resizeEncoder(getU32Codec(), size => size + 4);
const getBiggerU32Decoder = () => resizeDecoder(getU32Codec(), size => size + 4);
const getBiggerU32Codec = () => combineCodec(getBiggerU32Encoder(), getBiggerU32Decoder());
The offsetCodec
function is a powerful codec primitive that allows you to move the offset of a given codec forward or backwards. It accepts one or two functions that takes the current offset and returns a new offset.
To understand how this works, let's take our previous biggerU32Codec
example which encodes a u32
number inside an 8-byte buffer.
const biggerU32Codec = resizeCodec(getU32Codec(), size => size + 4);
biggerU32Codec.encode(0xffffffff);
// 0xffffffff00000000
// | └-- Empty buffer space caused by the resizeCodec function.
// └-- Our encoded u32 number.
Now, let's say we want to move the offset of that codec 2 bytes forward so that the encoded number sits in the middle of the buffer. To achieve, this we can use the offsetCodec
helper and provide a preOffset
function that moves the "pre-offset" of the codec 2 bytes forward.
const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
preOffset: ({ preOffset }) => preOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
// └-- Our encoded u32 number is now in the middle of the buffer.
We refer to this offset as the "pre-offset" because, once the inner codec is encoded or decoded, an additional offset will be returned which we refer to as the "post-offset". That "post-offset" is important as, unless we are reaching the end of our codec, it will be used by any further codecs to continue encoding or decoding data.
By default, that "post-offset" is simply the addition of the "pre-offset" and the size of the encoded or decoded inner data.
const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
preOffset: ({ preOffset }) => preOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
// | | └-- Post-offset.
// | └-- New pre-offset: The original pre-offset + 2.
// └-- Pre-offset: The original pre-offset before we adjusted it.
However, you may also provide a postOffset
function to adjust the "post-offset". For instance, let's push the "post-offset" 2 bytes forward as well such that any further codecs will start doing their job at the end of our 8-byte u32
number.
const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
preOffset: ({ preOffset }) => preOffset + 2,
postOffset: ({ postOffset }) => postOffset + 2,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
// | | | └-- New post-offset: The original post-offset + 2.
// | | └-- Post-offset: The original post-offset before we adjusted it.
// | └-- New pre-offset: The original pre-offset + 2.
// └-- Pre-offset: The original pre-offset before we adjusted it.
Both the preOffset
and postOffset
functions offer the following attributes:
bytes
: The entire byte array being encoded or decoded.preOffset
: The original and unaltered pre-offset.wrapBytes
: A helper function that wraps the given offset around the byte array length. E.g. wrapBytes(-1)
will refer to the last byte of the byte array.Additionally, the post-offset function also provides the following attributes:
newPreOffset
: The new pre-offset after the pre-offset function has been applied.postOffset
: The original and unaltered post-offset.Note that you may also decide to ignore these attributes to achieve absolute offsets. However, relative offsets are usually recommended as they won't break your codecs when composed with other codecs.
const u32InTheMiddleCodec = offsetCodec(biggerU32Codec, {
preOffset: () => 2,
postOffset: () => 8,
});
u32InTheMiddleCodec.encode(0xffffffff);
// 0x0000ffffffff0000
Also note that any negative offset or offset that exceeds the size of the byte array will throw a SolanaError
of code SOLANA_ERROR__CODECS__OFFSET_OUT_OF_RANGE
.
const u32InTheEndCodec = offsetCodec(biggerU32Codec, { preOffset: () => -4 });
u32InTheEndCodec.encode(0xffffffff);
// throws new SolanaError(SOLANA_ERROR__CODECS__OFFSET_OUT_OF_RANGE)
To avoid this, you may use the wrapBytes
function to wrap the offset around the byte array length. For instance, here's how we can use the wrapBytes
function to move the pre-offset 4 bytes from the end of the byte array.
const u32InTheEndCodec = offsetCodec(biggerU32Codec, {
preOffset: ({ wrapBytes }) => wrapBytes(-4),
});
u32InTheEndCodec.encode(0xffffffff);
// 0x00000000ffffffff
As you can see, the offsetCodec
helper allows you to jump all over the place with your codecs. This non-linear approach to encoding and decoding data allows you to achieve complex serialization strategies that would otherwise be impossible.
As usual, the offsetEncoder
and offsetDecoder
functions can also be used to split your codec logic into tree-shakeable functions.
const getU32InTheMiddleEncoder = () => offsetEncoder(biggerU32Encoder, { preOffset: ({ preOffset }) => preOffset + 2 });
const getU32InTheMiddleDecoder = () => offsetDecoder(biggerU32Decoder, { preOffset: ({ preOffset }) => preOffset + 2 });
const getU32InTheMiddleCodec = () => combineCodec(getU32InTheMiddleEncoder(), getU32InTheMiddleDecoder());
The padLeftCodec
and padRightCodec
helpers can be used to add padding to the left or right of a given codec. They accept an offset
number that tells us how big the padding should be.
const getLeftPaddedCodec = () => padLeftCodec(getU16Codec(), 4);
getLeftPaddedCodec().encode(0xffff);
// 0x00000000ffff
// | └-- Our encoded u16 number.
// └-- Our 4-byte padding.
const getRightPaddedCodec = () => padRightCodec(getU16Codec(), 4);
getRightPaddedCodec().encode(0xffff);
// 0xffff00000000
// | └-- Our 4-byte padding.
// └-- Our encoded u16 number.
Note that both the padLeftCodec
and padRightCodec
functions are simple wrappers around the offsetCodec
and resizeCodec
functions. For more complex padding strategies, you may want to use the offsetCodec
and resizeCodec
functions directly instead.
As usual, encoder-only and decoder-only helpers are available for these padding functions. Namely, padLeftEncoder
, padRightEncoder
, padLeftDecoder
and padRightDecoder
.
const getMyPaddedEncoder = () => padLeftEncoder(getU16Encoder());
const getMyPaddedDecoder = () => padLeftDecoder(getU16Decoder());
const getMyPaddedCodec = () => combineCodec(getMyPaddedEncoder(), getMyPaddedDecoder());
The reverseCodec
helper reverses the bytes of the provided FixedSizeCodec
.
const getBigEndianU64Codec = () => reverseCodec(getU64Codec());
Note that number codecs can already do that for you via their endian
option.
const getBigEndianU64Codec = () => getU64Codec({ endian: Endian.BIG });
As usual, the reverseEncoder
and reverseDecoder
functions can also be used to achieve that.
const getBigEndianU64Encoder = () => reverseEncoder(getU64Encoder());
const getBigEndianU64Decoder = () => reverseDecoder(getU64Decoder());
const getBigEndianU64Codec = () => combineCodec(getBigEndianU64Encoder(), getBigEndianU64Decoder());
This package also provides utility functions for managing bytes such as:
mergeBytes
: Concatenates an array of Uint8Arrays
into a single Uint8Array
.padBytes
: Pads a Uint8Array
with zeroes (to the right) to the specified length.fixBytes
: Pads or truncates a Uint8Array
so it has the specified length.// Merge multiple Uint8Array buffers into one.
mergeBytes([new Uint8Array([1, 2]), new Uint8Array([3, 4])]); // Uint8Array([1, 2, 3, 4])
// Pad a Uint8Array buffer to the given size.
padBytes(new Uint8Array([1, 2]), 4); // Uint8Array([1, 2, 0, 0])
padBytes(new Uint8Array([1, 2, 3, 4]), 2); // Uint8Array([1, 2, 3, 4])
// Pad and truncate a Uint8Array buffer to the given size.
fixBytes(new Uint8Array([1, 2]), 4); // Uint8Array([1, 2, 0, 0])
fixBytes(new Uint8Array([1, 2, 3, 4]), 2); // Uint8Array([1, 2])
To read more about the available codecs and how to use them, check out the documentation of the main @solana/codecs
package.
FAQs
Core types and helpers for encoding and decoding byte arrays on Solana
The npm package @solana/codecs-core receives a total of 228,253 weekly downloads. As such, @solana/codecs-core popularity was classified as popular.
We found that @solana/codecs-core demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 14 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.