brotli-unicode
Advanced tools
Comparing version 1.0.0 to 1.0.1
{ | ||
"name": "brotli-unicode", | ||
"version": "1.0.0", | ||
"version": "1.0.1", | ||
"main": "index.js", | ||
@@ -44,3 +44,3 @@ "typings": "dist/index.d.ts", | ||
"base-unicode": "^1.0.0", | ||
"brotli-compress": "^1.3.0" | ||
"brotli-compress": "^1.3.1" | ||
}, | ||
@@ -47,0 +47,0 @@ "devDependencies": { |
132
README.md
@@ -1,15 +0,17 @@ | ||
# base-unicode | ||
# brotli-unicode | ||
Transcodes `string` and `Uint8Array` (binary) blob data to and from Unicode. | ||
This algorithm allows for character compression as two bytes are usually represented | ||
by one Unicode character in the alphabet, base-unicode uses. | ||
This library compresses using the Brotli algorithm, based on WebAssembly. | ||
After the compression has been done, another character encoding/compression algorithm is applied: `base-unicode` | ||
`base-unicode` transcodes the `Uint8Array` into a Unicode string that is shorter than the original text (character wise). | ||
Data in this form can also be copy-pasted when modern system fonts are used. Also, modern browsers allow Unicode | ||
in URIs. Therefore, data compressed and encoded with this library can be transmitted via URLs. | ||
base-unicode therefore allows for a lossless conversion of binary data to and from | ||
Unicode. This is useful for storing binary data in a database, for example but | ||
also for shortening binary data for a text representation that can be copy-pasted. | ||
For decoding, you can either use the JS variant, which is much smaller in code size, or you can also use the | ||
WebAssembly implementation. | ||
This again allows e.g. for sharing binary and text data in a character compressed | ||
form that can be easily copied and pasted, for example as a parameter in a URL or | ||
even via twitter. | ||
This algorithm allows for stellar compression ratios on text and binary data. | ||
In our test scenario we're proud to present a compression rate of 558%. | ||
<img src="jest_results.png" /> | ||
## Setup | ||
@@ -20,10 +22,10 @@ | ||
```bash | ||
yarn add base-unicode | ||
yarn add brotli-unicode | ||
# or | ||
npm i base-unicode | ||
npm i brotli-unicode | ||
``` | ||
## Usage | ||
## Usage of the WASM variant | ||
@@ -33,30 +35,96 @@ The usage in a Node.js or Browser environment is trivial: | ||
```ts | ||
import { encode, decodeToString, decodeToUint8Array } from 'base-unicode' | ||
// import size (uncompressed, but minified) / WASM version / max performance: 1.8M | ||
import { compress, decompress } from 'brotli-unicode' | ||
// encoding + decoding strings | ||
const encoded = encode('Hello, world!') // 1劒碶翚禼誎藝矚h | ||
const decoded = decodeToString(encoded) // Hello, world! | ||
// Node.js or using the buffer package | ||
let input = Buffer.from('Hello🤖!') | ||
//encoding + decoding binary data | ||
const input = new Uint8Array([0xb, 0xa, 0xb, 0xe]) // a.k.a. [ 11, 10, 11, 14 ] | ||
// alternatively, in-browser (without any third-party libraries) | ||
input = TextEncoder.encode('Hello🤖!') | ||
// you can of course use File, Blob and Buffer as well | ||
const encodedBinary = encode(input) // 0A坘存 | ||
const decodedBinary = decodeToUint8Array(encodedBinary) // [ 11, 10, 11, 14 ] | ||
// it takes a Uint8Array and returns a base-unicode encoded string (copy and pasteable) | ||
const compressed = await compress(input) | ||
// it takes the base-unicode encoded string and returns a Uint8Array | ||
const decompressed = await decompress(compressed) | ||
// Node.js or using the buffer package | ||
let output = Buffer.from(decompressed) | ||
// alternatively, in-browser (without any third-party libraries) | ||
output = TextDecoder.decode(decompressed) | ||
``` | ||
Please note that the WASM version comes with a whopping size of (minified) | ||
1.8MiB. This is, because the binary is base64 encoded and inlined. | ||
If you prefer maximum performance and memory efficiency over small bundle size, | ||
choose the WASM variant. Also, if you need compression, use the WASM version. | ||
## Usage of the pure JS variant | ||
If you need a small bundle size, can effort the slowdown and | ||
only need decompression, use the hard-written JavaScript decompressor: | ||
```ts | ||
// import size (uncompressed, but minified) / JS version / only decompress / slower: 152K | ||
import { decompress } from 'brotli-unicode/js' | ||
// please also note that the pure JS variant is synchronous | ||
// for large inputs, you could optimize the execution by moving | ||
// this call into a Worker | ||
// it takes a base-unicode encoded string and returns a Uint8Array | ||
const decompressed = decompress(compressed) | ||
// Node.js or using the buffer package | ||
let output = Buffer.from(decompressed) | ||
// alternatively, in-browser (without any third-party libraries) | ||
output = TextDecoder.decode(decompressed) | ||
``` | ||
## Options | ||
The `compress` method comes with a second `options` parameter. | ||
### Quality level | ||
The most common setting is `quality` with a scale from 0 to 11. | ||
By default, the quality is set to best quality (11). | ||
```ts | ||
const compressed = await compress(Buffer.from('foobar'), { quality: 9 }) | ||
``` | ||
A lower quality value makes the output bigger but improves compression time. | ||
In 99.9% of the cases, you don't want to change this value. | ||
### Custom dictionary | ||
The relevant options here is `customDictionary`. You can set this to an Uint8Array string | ||
of tokens which should be part of the `a priori` known dictionary. This can be useful | ||
if you have power over both, the sender and the receiver part and if you know exactly | ||
which tokens will be used alot in the input. For example, if you know that you'll | ||
be compressing text, encoded as UTF16/UCS-2 and you know that the content is TypeScript code, | ||
you could include the keywords of the TypeScript language in the custom dictionary. | ||
Please mind, that you need to set the same value for decoding as well. | ||
```ts | ||
// with this configuration, "let" must not be encoded in the dictionary and carried as part of the | ||
// payload. The binary tree (huffman coding tree) | ||
const customDictionary = Buffer.from('let') | ||
const compressed = await compress('let foo = 123; let bar = "foo";', { customDictionary }) | ||
const decompressed = await decompress(compressed, { customDictionary }) | ||
``` | ||
## Limitations | ||
The alphabet of `base-unicode` is `21091` characters long. It has been carefully | ||
selected to be supported by the majority of system fonts. The default base-unicode | ||
alphabet consists of the following Unicode character ranges (always upper- and lower-case included): | ||
a-z, α-ω, а-я 一-龯 | ||
There is no streaming compression/decompression yet. It can be simply done by exposing the API from the WASM implementation. | ||
If you need that, pls. ping via Issue. | ||
To make sure that the alphabet is URL-safe and doesn't run into invisible character issues, | ||
all non-printable control characters and none-URL-safe characters are excluded. | ||
## Build | ||
However, some fonts don't support all of these characters. To check if your | ||
system supports copying and pasting text that has been encoded with `base-unicde`, | ||
you can simply check the ALPHABET file. If you can spot one character that shows | ||
as a non-renderable square, this algorithm doesn't work on your system. | ||
yarn build | ||
@@ -63,0 +131,0 @@ ## Test |
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
4827168
16
132
16398
Updatedbrotli-compress@^1.3.1