Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
znana.top/arno/base62
base62 is a correctly implemented, compact and fast implementation of Base62
encoding/decoding algorithm.
It is inspired by the java implementation by glowfall.
This Base62
implementation can encode/decode both integers and bytes of arbitrary length,
the correctness is tested using a large number of randomly generated bytes.
It is much faster than big.Int
based implementation and is not much slower than typical Base64
implementations. See the benchmark results below.
In comparison with Base64/Base32, Base62 is more friendly to human:
Variations of Base62 algorithm cann be widely used to represent authentication data in printable and
easy-copyable form, for example to encode OAuth 2.0 access_token
data.
// Basic usage.
Encode(src []byte) []byte
EncodeToString(src []byte) string
Decode(src []byte) ([]byte, error)
DecodeString(src string) ([]byte, error)
FormatInt(num int64) []byte
FormatUint(num uint64) []byte
ParseInt(src []byte) (int64, error)
ParseUint(src []byte) (uint64, error)
// Providing a dst buffer, you may reuse buffers to reduce memory allocation.
EncodeToBuf(dst []byte, src []byte) []byte
DecodeToBuf(dst []byte, src []byte) ([]byte, error)
AppendInt(dst []byte, num int64) []byte
AppendUint(dst []byte, num uint64) []byte
// Or you may use a custom encoding alphabet.
enc := NewEncoding("...my-62-byte-string-alphabet...")
enc.XXX()
Benchmark_Encode-12 11654754 97.41 ns/op
Benchmark_Decode-12 15481666 73.60 ns/op
Benchmark_EncodeToString-12 11950086 99.54 ns/op
Benchmark_DecodeString-12 16301325 74.36 ns/op
Benchmark_EncodeToBuf-12 13855840 84.68 ns/op
Benchmark_DecodeToBuf-12 97695962 12.21 ns/op
Benchmark_EncodeInteger-12 29119437 41.30 ns/op
Benchmark_DecodeInteger-12 120328183 9.917 ns/op
Benchmark_Encode_BigInt-12 1000000 1048 ns/op
Benchmark_Base64_EncodeToString-12 17803440 70.12 ns/op
Benchmark_Base64_DecodeString-12 19884616 55.09 ns/op
Benchmark_Base64_Encode-12 68163142 17.93 ns/op
Benchmark_Base64_Decode-12 41990004 28.25 ns/op
Encoding Base64 and Base32 both are bit-aligned, 6 bits for Base64 (2^6 = 64) and 5 bits for Base32 (2^5 = 32), but Base62 is not bit-aligned, it's inefficient to do divmod operation for non-bit-aligned integers, typical Base62 implementations are BigInt based, which encodes input data block by block to get better performance, (e.g. https://github.com/keybase/saltpack/blob/master/encoding/basex).
64 characters can fully represent 6 bits, but 62 characters can not, if each character represents 5 bits, that is the Base32 encoding.
A naive BigInt based algorithm gives the shortest result, which is roughly like this (e.g. https://github.com/eknkc/basex/blob/6baac8ea8b19cc66d125286d213770fec0691867/basex.go#L46):
digits := []int{0}
for i := 0; i < len(src); i++ {
carry := int(src[i])
for j := 0; j < len(digits); j++ {
carry += digits[j] << 8
digits[j] = carry % e.base
carry = carry / e.base
}
for carry > 0 {
digits = append(digits, carry%e.base)
carry = carry / e.base
}
}
// map to encoding alphabet
_ = digits
This works but the time complexity is O(n^2) where n is the length of src. If the input can be very large, the cost is unacceptable.
Improved algorithm splits the input into blocks, which reduce the time complexity to O(kn) where k is the block size, and n is the length of src, it generates slightly longer result than naive BigInt algorithm, but it is worth for the reduced time complexity. When k is n, it degrades to the naive algorithm, when k is 1, the output length is twice of the input, we don't want that. 32 is chosen as the block size in library saltpack/encoding/basex.
Inspired by the java implementation by glowfall, this library is not BigInt based, it encodes and decodes any arbitrary bytes in O(n) time complexity. Here is a brief description of the algorithm.
This library uses a variadic length encoding, for each 6 bits, if the value is in range [0, 62), it can be directly map to the 62 characters, if the value is 62 or 63 which exceeds 61, we turn to encoding just the lower 5 bits. If we find a way to recognize the 5 bits pattern, then we can correctly decode it back to the source data.
The binary representation of 62 and 63 is:
62 - 0b_0011_1110
63 - 0b_0011_1111
They have a common mask 0b_0011_1110
, in range [0, 62), there are another two integers
have a similar mask 0b_0001_1110
, 30 and 31, which is:
30 - 0b_0001_1110
31 - 0b_0001_1111
The four integers 30, 31, 62, 63 share a common mask 0b_0001_1110
, while all other
integers in range [0, 64) don't share the mask, i.e. for all other integers, the
expression value & 0b_0001_1110 == 0b_0001_1110
evaluates to false.
This is the key point!
We define a compactMask
as 0b_0001_1110
.
When encoding, for each 6 bits integer x, if x & compactMask == compactMask
is true,
it must be one of 30, 31, 62 or 63, we just encode the lower 5 bits, which are
0b_0001_1110
(30) and 0b_0001_1111
(31), we leave the 6-th bit to next byte.
When decoding, for each encoded byte x, we check x & compactMask == compactMask
, when
it is false, we know that the byte represents 6 bits in the raw data, it is in range
[0, 30) or [32, 62), else it represents 5 bits in the raw data, it is 30 or 31.
That is it, by using variadic length encoding, we successfully limit the value range to [0, 62), we get very compact result and a simple O(n) time complexity for encoding and decoding data of arbitrary length.
This library guarantees that it can correctly decode data encoded by itself.
The encoded result is not compatible with BigInt based algorithm, (e.g. saltpack/encoding/basex, GMP and GnuPG).
The algorithm may be ported to other languages in future (if someone does a porting, I'm glad to link it here :-) ).
First stable release, the package has been used in several small and medium projects.
This release adds new APIs which help to reuse buffers to reduce memory allocation.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.