Bijective varint encoding
This is a simple, zero dependency javascript library which implements length-prefixed varint encoding. The format is similar to LEB128 but:
- It uses a bijective base, based off this HN comment. This means every number has exactly 1 canonical binary encoding.
- The length prefixed encoding should play better with the branch predictor in native implementations. (Though this implementation isn't super optimized).
This format is extremely similar to how UTF8 works internally. Its almost certainly possible to reuse existing efficient UTF8 <-> UTF32 SIMD encoders and decoders to make this code faster, but frankly its not a priority right now.
0 - 2^7-1
encodes as 0xxx_xxxx
2^7 - 2^14+2^7-1
encodes as 10xx_xxxx xxxx_xxxx
2^14+2^7 - 2^21+2^14+2^7-1
encodes as 110x_xxxx xxxx_xxxx xxxx_xxxx
2^21 - 2^28-1
encodes as 1110_xxxx xxxx_xxxx xxxx_xxxx xxxx_xxxx
... And so on, where xxxx_xxxx
is the next bytes of the number within its range, in big endian. Eg, to encode 130:
- Its within the 2 byte range, which starts at 128. So we first subtract 130-128 = 2. It will be put into the 2 byte pattern:
10xx_xxxx xxxx_xxxx
. - 2 is then encoded to the big endian bits
0b10
. - The end result is
1000_0000 0000_0010
, or 0x8002
.
The encoding supports any integer up to 128 bits.
For numbers above 64 bits, it would be tempting to use 0x1111_1111 1111_1111 xxxx_xxxx xxxx_xxxx ...
since then there would be at most 2 bytes of overhead (or 4 bytes of overhead for 128 bits). But that breaks the pattern, so instead it uses this as the maximum encoding for 64 bits: 0x1111_1111 1111_1111 0xxx_xxxx
... And for 128 bits: 0x1111_1111 1111_1111 1111_1111 1111_1111 0xxx_xxxx
.
For negative integers, we use protobuf's zigzag encoding format. (0 encodes as 0, -1 encodes as 1, 1 encodes as 2, -2 encodes as 3, and so on).
For more examples, or if you want to test compatibility with another implementation, see compatibility tests in varint_tests.txt.
How to use it
npm install --save bijective-varint
Then:
const varint = require('bijective-varint')
const enc = varint.encode(2020304050)
const dec = varint.decode(enc)
API reference
The API is defined and exposed via the following typescript definitions:
const MAX_INT_LEN = 9;
const MAX_BIGINT_LEN = 19;
function bytesUsed(bytes: Uint8Array): number;
function bufContainsVarint(bytes: Uint8Array): boolean;
function encode(num: number): Uint8Array;
function encodeInto(num: number, dest: Uint8Array, offset: number): number;
function decode(bytes: Uint8Array): number;
function encodeBN(num: bigint): Uint8Array;
const MAX_SAFE_BIGINT: bigint;
function encodeIntoBN(num: bigint, dest: Uint8Array, offset: number): number;
function decodeBN(bytes: Uint8Array): bigint;
function zigzagEncode(val: number): number;
function zigzagDecode(val: number): number;
function zigzagEncodeBN(val: bigint): bigint;
function zigzagDecodeBN(val: bigint): bigint;
License
MIT licensed.