UTF8.js
A simple JavaScript library to encode/decode UTF8 strings.
## Encoding
A char:
UTF8.setBytesFromCharCode('é'.charCodeAt(0));
A string:
UTF8.setBytesFromString('1.3$ ~= 1€');
## Decoding
A char:
String.fromCharCode(UTF8.getCharCode([0xC3, 0xA9]);
A string:
UTF8.getStringFromBytes([49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]);
TypedArrays are welcome
As inputs :
var bytes=new Uint8Array([0xC3, 0xA9, 49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]);
String.fromCharCode(UTF8.getCharCode(bytes));
UTF8.getStringFromBytes(bytes,2);
As well as outputs :
var bytes=new Uint8Array(14);
UTF8.setBytesFromCharCode('é'.charCodeAt(0));
UTF8.setBytesFromString('1.3$ ~= 1€', 2);
UTF8 encoding detection
UTF8.isNotUTF8(bytes);
This function can prove the text contained by the given bytes is not UTF-8
(or badly encoded UTF-8 string). It's not reciprocally true, especially for
short strings with wich false positives are frequent.
Strict mode
If you try to encode an UTF8 string in an ArrayBuffer too short to contain the
complete string, it will silently fail. To avoid this behavior, use the strict
mode :
UTF8.setBytesFromString('1.3$ ~= 1€', 2, null, true);
NodeJS
Also available on NPM :
npm install utf-8
Thanks
- The Debian project for it's free (as freedom) russian/japanese man pages
used for real world files tests !