@solana/codecs-strings
This package contains codecs for strings of different sizes and encodings. It can be used standalone, but it is also exported as part of the Solana JavaScript SDK @solana/web3.js@experimental
.
This package is also part of the @solana/codecs
package which acts as an entry point for all codec packages as well as for their documentation.
Sizing string codecs
The @solana/codecs-strings
package offers a variety of string codecs such as utf8
, base58
, base64
, etc — which we will discuss in more detail below. However, before digging into the available string codecs, it's important to understand the different sizing strategies available for string codecs.
By default, all available string codecs will return a VariableSizeCodec<string>
meaning that:
- When encoding a string, all bytes necessary to encode the string will be used.
- When decoding a byte array at a given offset, all bytes starting from that offset will be decoded as a string.
For instance, here's how you can encode/decode utf8
strings without any size boundary:
const codec = getUtf8Codec();
codec.encode('hello');
codec.decode(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));
This might be what you want — e.g. when having a string at the end of a data structure — but in many cases, you might want to have a size boundary for your string. You may achieve this by composing your string codec with the fixCodecSize
or addCodecSizePrefix
functions.
The fixCodecSize
function accepts a fixed byte length and returns a FixedSizeCodec<string>
that will always use that amount of bytes to encode and decode a string. Any string longer or smaller than that size will be truncated or padded respectively. Here's how you can use it with a utf8
codec:
const codec = fixCodecSize(getUtf8Codec(), 5);
codec.encode('hello');
codec.encode('hello world');
codec.encode('hell');
codec.decode(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xff, 0xff, 0xff, 0xff]));
The addCodecSizePrefix
function accepts an additional number codec that will be used to encode and decode a size prefix for the string. This prefix allows us to know when to stop reading the string when decoding a given byte array. Here's how you can use it with a utf8
codec:
const codec = addCodecSizePrefix(getUtf8Codec(), getU32Codec());
codec.encode('hello');
codec.decode(new Uint8Array([0x05, 0x00, 0x00, 0x00, 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xff, 0xff, 0xff, 0xff]));
Now, let's take a look at the available string encodings. Just remember that you can use the fixSizeCodec
or prefixSizeCodec
functions on any of these encodings to add a size boundary to them.
Utf8 codec
The getUtf8Codec
function encodes and decodes a UTF-8 string to and from a byte array.
const bytes = getUtf8Codec().encode('hello');
const value = getUtf8Codec().decode(bytes);
As usual, separate getUtf8Encoder
and getUtf8Decoder
functions are also available.
const bytes = getUtf8Encoder().encode('hello');
const value = getUtf8Decoder().decode(bytes);
Base 64 codec
The getBase64Codec
function encodes and decodes a base-64 string to and from a byte array.
const bytes = getBase64Codec().encode('hello+world');
const value = getBase64Codec().decode(bytes);
As usual, separate getBase64Encoder
and getBase64Decoder
functions are also available.
const bytes = getBase64Encoder().encode('hello+world');
const value = getBase64Decoder().decode(bytes);
Base 58 codec
The getBase58Codec
function encodes and decodes a base-58 string to and from a byte array.
const bytes = getBase58Codec().encode('heLLo');
const value = getBase58Codec().decode(bytes);
As usual, separate getBase58Encoder
and getBase58Decoder
functions are also available.
const bytes = getBase58Encoder().encode('heLLo');
const value = getBase58Decoder().decode(bytes);
Base 16 codec
The getBase16Codec
function encodes and decodes a base-16 string to and from a byte array.
const bytes = getBase16Codec().encode('deadface');
const value = getBase16Codec().decode(bytes);
As usual, separate getBase16Encoder
and getBase16Decoder
functions are also available.
const bytes = getBase16Encoder().encode('deadface');
const value = getBase16Decoder().decode(bytes);
Base 10 codec
The getBase10Codec
function encodes and decodes a base-10 string to and from a byte array.
const bytes = getBase10Codec().encode('1024');
const value = getBase10Codec().decode(bytes);
As usual, separate getBase10Encoder
and getBase10Decoder
functions are also available.
const bytes = getBase10Encoder().encode('1024');
const value = getBase10Decoder().decode(bytes);
Base X codec
The getBaseXCodec
accepts a custom alphabet
of X
characters and creates a base-X codec using that alphabet. It does so by iteratively dividing by X
and handling leading zeros.
The base-10 and base-58 codecs use this base-x codec under the hood.
const alphabet = '0ehlo';
const bytes = getBaseXCodec(alphabet).encode('hello');
const value = getBaseXCodec(alphabet).decode(bytes);
As usual, separate getBaseXEncoder
and getBaseXDecoder
functions are also available.
const bytes = getBaseXEncoder(alphabet).encode('hello');
const value = getBaseXDecoder(alphabet).decode(bytes);
Re-slicing base X codec
The getBaseXResliceCodec
also creates a base-x codec but uses a different strategy. It re-slices bytes into custom chunks of bits that are then mapped to a provided alphabet
. The number of bits per chunk is also provided and should typically be set to log2(alphabet.length)
.
This is typically used to create codecs whose alphabet’s length is a power of 2 such as base-16 or base-64.
const bytes = getBaseXResliceCodec('elho', 2).encode('hellolol');
const value = getBaseXResliceCodec('elho', 2).decode(bytes);
As usual, separate getBaseXResliceEncoder
and getBaseXResliceDecoder
functions are also available.
const bytes = getBaseXResliceEncoder('elho', 2).encode('hellolol');
const value = getBaseXResliceDecoder('elho', 2).decode(bytes);
To read more about the available codecs and how to use them, check out the documentation of the main @solana/codecs
package.