What is text-encoding?
The text-encoding npm package provides a polyfill for the TextEncoder and TextDecoder APIs, which are used to encode and decode text in various character encodings. This is particularly useful for handling text data in web applications, especially when dealing with different character sets and ensuring proper encoding and decoding of text.
What are text-encoding's main functionalities?
Text Encoding
This feature allows you to encode a string into a Uint8Array using a specified encoding (UTF-8 in this case). The code sample demonstrates how to encode the string 'Hello, world!' into a Uint8Array.
const { TextEncoder } = require('text-encoding');
const encoder = new TextEncoder('utf-8');
const encoded = encoder.encode('Hello, world!');
console.log(encoded);
Text Decoding
This feature allows you to decode a Uint8Array back into a string using a specified encoding (UTF-8 in this case). The code sample demonstrates how to decode a Uint8Array back into the string 'Hello, world!'.
const { TextDecoder } = require('text-encoding');
const decoder = new TextDecoder('utf-8');
const encoded = new Uint8Array([72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33]);
const decoded = decoder.decode(encoded);
console.log(decoded);
Other packages similar to text-encoding
buffer
The buffer package provides a way of handling binary data in Node.js. It includes methods for encoding and decoding text, similar to text-encoding, but also offers a broader range of functionalities for working with binary data.
iconv-lite
The iconv-lite package provides pure JavaScript character encoding conversion. It supports a wide range of encodings and is often used for converting text between different character sets. It offers more extensive encoding support compared to text-encoding.
utf-8-validate
The utf-8-validate package is used to validate UTF-8 encoded data. While it does not provide encoding and decoding functionalities, it is useful for ensuring that data is correctly encoded in UTF-8, which can complement the functionalities provided by text-encoding.
text-encoding
This is a polyfill for the Encoding Living
Standard API for the Web, allowing
encoding and decoding of textual data to and from Typed Array buffers
for binary data in JavaScript.
By default it adheres to the spec and does not support encoding to
legacy encodings, only decoding. It is also implemented to match the
specification's algorithms, rather than for performance. The intended
use is within Web pages, so it has no dependency on server frameworks
or particular module schemes.
Basic examples and tests are included.
Install
There are a few ways you can get the text-encoding
library.
Node
text-encoding
is on npm
. Simply run:
npm install text-encoding
Or add it to your package.json
dependencies.
Bower
text-encoding
is on bower
as well. Install with bower like so:
bower install text-encoding
Or add it to your bower.json
dependencies.
HTML Page Usage
<script src="encoding-indexes.js"></script>
<script src="encoding.js"></script>
API Overview
Basic Usage
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
Streaming Decode
var string = "", decoder = new TextDecoder(encoding), buffer;
while (buffer = next_chunk()) {
string += decoder.decode(buffer, {stream:true});
}
string += decoder.decode();
Encodings
All encodings from the Encoding specification are supported:
utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874
windows-1250 windows-1251 windows-1252 windows-1253 windows-1254
windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic
gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr
replacement utf-16be utf-16le x-user-defined
(Some encodings may be supported under other names, e.g. ascii,
iso-8859-1, etc. See Encoding for
additional labels for each encoding.)
Encodings other than utf-8, utf-16le and utf-16be require
an additional encoding-indexes.js
file to be included. It is rather
large (596kB uncompressed, 188kB gzipped); portions may be deleted if
support for some encodings is not required.
Non-Standard Behavior
As required by the specification, only encoding to utf-8 is
supported. If you want to try it out, you can force a non-standard
behavior by passing the NONSTANDARD_allowLegacyEncoding
option to
TextEncoder and a label. For example:
var uint8array = new TextEncoder(
'windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(text);
But note that the above won't work if you're using the polyfill in a
browser that natively supports the TextEncoder API natively, since the
polyfill won't be used!
You can force the polyfill to be used by using this before the polyfill:
<script>
window.TextEncoder = window.TextDecoder = null;
</script>
To support the legacy encodings (which may be stateful), the
TextEncoder encode()
method accepts an optional dictionary and
stream
option, e.g. encoder.encode(string, {stream: true});
This
is not needed for standard encoding since the input is always in
complete code points.