encoding-japanese
Advanced tools
Comparing version 2.0.0 to 2.1.0
{ | ||
"name": "encoding-japanese", | ||
"version": "2.0.0", | ||
"description": "Convert or detect character encoding in JavaScript", | ||
"version": "2.1.0", | ||
"description": "Convert and detect character encoding in JavaScript", | ||
"main": "src/index.js", | ||
@@ -9,3 +9,2 @@ "files": [ | ||
"encoding.min.js", | ||
"encoding.min.js.map", | ||
"src/*" | ||
@@ -16,4 +15,4 @@ ], | ||
"compile": "browserify src/index.js -o encoding.js -s Encoding -p [ bannerify --file src/banner.js ] --no-bundle-external --bare", | ||
"minify": "uglifyjs encoding.js -o encoding.min.js --source-map \"url='encoding.min.js.map'\" --comments -c -m -b ascii_only=true,beautify=false", | ||
"test": "./node_modules/.bin/eslint . && npm run build && mocha tests/test", | ||
"minify": "uglifyjs encoding.js -o encoding.min.js --comments -c -m -b ascii_only=true,beautify=false", | ||
"test": "eslint . && npm run build && mocha tests/test", | ||
"watch": "watchify src/index.js -o encoding.js -s Encoding -p [ bannerify --file src/banner.js ] --no-bundle-external --bare --poll=300 -v" | ||
@@ -59,7 +58,7 @@ }, | ||
"browserify": "^17.0.0", | ||
"eslint": "^8.12.0", | ||
"mocha": "^9.2.2", | ||
"eslint": "^8.57.0", | ||
"mocha": "^10.3.0", | ||
"package-json-versionify": "^1.0.4", | ||
"power-assert": "^1.6.1", | ||
"uglify-js": "^3.15.3", | ||
"uglify-js": "^3.17.4", | ||
"uglifyify": "^5.0.2", | ||
@@ -66,0 +65,0 @@ "watchify": "^4.0.0" |
659
README.md
@@ -5,8 +5,8 @@ encoding.js | ||
[![NPM Version](https://img.shields.io/npm/v/encoding-japanese.svg)](https://www.npmjs.com/package/encoding-japanese) | ||
[![Build Status](https://app.travis-ci.com/polygonplanet/encoding.js.svg?branch=master)](https://app.travis-ci.com/polygonplanet/encoding.js) | ||
[![GitHub Actions Build Status](https://github.com/polygonplanet/encoding.js/actions/workflows/ci.yml/badge.svg)](https://github.com/polygonplanet/encoding.js/actions) | ||
[![GitHub License](https://img.shields.io/github/license/polygonplanet/encoding.js.svg)](https://github.com/polygonplanet/encoding.js/blob/master/LICENSE) | ||
Convert or detect character encoding in JavaScript. | ||
Convert and detect character encoding in JavaScript. | ||
[**README (Japanese)**](README_ja.md) | ||
[**README (日本語)**](README_ja.md) | ||
@@ -16,7 +16,7 @@ ## Table of contents | ||
- [Features](#features) | ||
* [How to use character encoding in strings?](#how-to-use-character-encoding-in-strings) | ||
* [How to Use Character Encoding in Strings?](#how-to-use-character-encoding-in-strings) | ||
- [Installation](#installation) | ||
* [npm](#npm) | ||
+ [TypeScript](#typescript) | ||
* [browser (standalone)](#browser-standalone) | ||
* [Browser (standalone)](#browser-standalone) | ||
* [CDN](#cdn) | ||
@@ -28,14 +28,17 @@ - [Supported encodings](#supported-encodings) | ||
- [API](#api) | ||
* [Detect character encoding (detect)](#detect-character-encoding-detect) | ||
* [Convert character encoding (convert)](#convert-character-encoding-convert) | ||
+ [Specify conversion options to the argument `to_encoding` as an object](#specify-conversion-options-to-the-argument-to_encoding-as-an-object) | ||
* [detect : Detects character encoding](#encodingdetect-data-encodings) | ||
* [convert : Converts character encoding](#encodingconvert-data-to-from) | ||
+ [Specify conversion options to the argument `to` as an object](#specify-conversion-options-to-the-argument-to-as-an-object) | ||
+ [Specify the return type by the `type` option](#specify-the-return-type-by-the-type-option) | ||
+ [Replace to HTML entity (Numeric character reference) when cannot be represented](#replace-to-html-entity-numeric-character-reference-when-cannot-be-represented) | ||
+ [Replacing characters with HTML entities when they cannot be represented](#replacing-characters-with-html-entities-when-they-cannot-be-represented) | ||
+ [Specify BOM in UTF-16](#specify-bom-in-utf-16) | ||
* [URL Encode/Decode](#url-encodedecode) | ||
* [Base64 Encode/Decode](#base64-encodedecode) | ||
* [Code array to string conversion (codeToString/stringToCode)](#code-array-to-string-conversion-codetostringstringtocode) | ||
* [urlEncode : Encodes to percent-encoded string](#encodingurlencode-data) | ||
* [urlDecode : Decodes from percent-encoded string](#encodingurldecode-string) | ||
* [base64Encode : Encodes to Base64 formatted string](#encodingbase64encode-data) | ||
* [base64Decode : Decodes from Base64 formatted string](#encodingbase64decode-string) | ||
* [codeToString : Converts character code array to string](#encodingcodetostring-code) | ||
* [stringToCode : Converts string to character code array](#encodingstringtocode-string) | ||
* [Japanese Zenkaku/Hankaku conversion](#japanese-zenkakuhankaku-conversion) | ||
- [Other examples](#other-examples) | ||
* [Example using the XMLHttpRequest and Typed arrays (Uint8Array)](#example-using-the-xmlhttprequest-and-typed-arrays-uint8array) | ||
* [Example using the `fetch API` and Typed Arrays (Uint8Array)](#example-using-the-fetch-api-and-typed-arrays-uint8array) | ||
* [Convert encoding for file using the File APIs](#convert-encoding-for-file-using-the-file-apis) | ||
@@ -47,19 +50,23 @@ - [Contributing](#contributing) | ||
encoding.js is a JavaScript library for converting and detecting character encodings | ||
that support Japanese character encodings such as `Shift_JIS`, `EUC-JP`, `JIS`, and `Unicode` such as `UTF-8` and `UTF-16`. | ||
encoding.js is a JavaScript library for converting and detecting character encodings, | ||
supporting both Japanese character encodings (`Shift_JIS`, `EUC-JP`, `ISO-2022-JP`) and Unicode formats (`UTF-8`, `UTF-16`). | ||
Since JavaScript string values are internally encoded as UTF-16 code units ([ref: ECMAScript® 2019 Language Specification - 6.1.4 The String Type](https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-types-string-type)), | ||
they cannot properly handle other character encodings as they are, but encoding.js enables conversion by handling them as arrays instead of strings. | ||
Since JavaScript string values are internally encoded as UTF-16 code units | ||
([ref: ECMAScript® 2019 Language Specification - 6.1.4 The String Type](https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-types-string-type)), | ||
they cannot directly handle other character encodings as strings. However, encoding.js overcomes this limitation by treating these encodings as arrays instead of strings, | ||
enabling the conversion between different character sets. | ||
Each character encoding is handled as an array of numbers with character code values, for example `[130, 160]` ("あ" in UTF-8). | ||
Each character encoding is represented as an array of numbers corresponding to character code values, for example, `[130, 160]` represents "あ" in UTF-8. | ||
The array of character codes passed to each method of encoding.js can also be used with TypedArray such as `Uint8Array`, and `Buffer` in Node.js. | ||
The array of character codes used in its methods can also be utilized with TypedArray objects, such as `Uint8Array`, or with `Buffer` in Node.js. | ||
### How to use character encoding in strings? | ||
### How to Use Character Encoding in Strings? | ||
Numeric arrays of character codes can be converted to strings with methods such as [`Encoding.codeToString`](#code-array-to-string-conversion-codetostringstringtocode) , | ||
but because of the above JavaScript specifications, some character encodings cannot be handled properly when converted to strings. | ||
Numeric arrays of character codes can be converted to strings using methods such as [`Encoding.codeToString`](#encodingcodetostring-code). | ||
However, due to the JavaScript specifications mentioned above, some character encodings may not be handled properly when converted directly to strings. | ||
So if you want to use strings instead of arrays, convert it to percent-encoded strings like `'%82%A0'` by using [`Encoding.urlEncode`](#url-encodedecode) and [`Encoding.urlDecode`](#url-encodedecode) to passed to other resources. | ||
Or, [`Encoding.base64Encode`](#base64-encodedecode) and [`Encoding.base64Decode`](#base64-encodedecode) can be passed as strings in the same way. | ||
If you prefer to use strings instead of numeric arrays, you can convert them to percent-encoded strings, | ||
such as `'%82%A0'`, using [`Encoding.urlEncode`](#encodingurlencode-data) and [`Encoding.urlDecode`](#encodingurldecode-string) for passing to other resources. | ||
Similarly, [`Encoding.base64Encode`](#encodingbase64encode-data) and [`Encoding.base64Decode`](#encodingbase64decode-string) allow for encoding and decoding to and from base64, | ||
which can then be passed as strings. | ||
@@ -73,6 +80,6 @@ ## Installation | ||
```bash | ||
$ npm install --save encoding-japanese | ||
npm install --save encoding-japanese | ||
``` | ||
#### using `import` | ||
#### Using ES6 `import` | ||
@@ -83,3 +90,3 @@ ```javascript | ||
#### using `require` | ||
#### Using CommonJS `require` | ||
@@ -92,56 +99,64 @@ ```javascript | ||
TypeScript type definitions for encoding.js are available at [@types/encoding-japanese](https://www.npmjs.com/package/@types/encoding-japanese) (thanks [@rhysd](https://github.com/rhysd)). | ||
TypeScript type definitions for encoding.js are available at [@types/encoding-japanese](https://www.npmjs.com/package/@types/encoding-japanese) (thanks to [@rhysd](https://github.com/rhysd)). | ||
```bash | ||
$ npm install --save-dev @types/encoding-japanese | ||
npm install --save-dev @types/encoding-japanese | ||
``` | ||
### browser (standalone) | ||
### Browser (standalone) | ||
Install from npm or download from the [release list](https://github.com/polygonplanet/encoding.js/tags) and use `encoding.js` or `encoding.min.js` in the package. | ||
\*Please note that if you `git clone`, even the *master* branch may be under development. | ||
To use encoding.js in a browser environment, you can either install it via npm or download it directly from the [release list](https://github.com/polygonplanet/encoding.js/tags). | ||
The package includes both `encoding.js` and `encoding.min.js`. | ||
Note: Cloning the repository via `git clone` might give you access to the *master* (or *main*) branch, which could still be in a development state. | ||
```html | ||
<!-- To include the full version --> | ||
<script src="encoding.js"></script> | ||
``` | ||
Or use the minified `encoding.min.js` | ||
```html | ||
<!-- Or, to include the minified version for production --> | ||
<script src="encoding.min.js"></script> | ||
``` | ||
When the script is loaded, the object `Encoding` is defined in the global scope (ie `window.Encoding`). | ||
When the script is loaded, the object `Encoding` is defined in the global scope (i.e., `window.Encoding`). | ||
### CDN | ||
You can use the encoding.js (package name: `encoding-japanese`) CDN on [cdnjs.com](https://cdnjs.com/libraries/encoding-japanese). | ||
You can use encoding.js (package name: `encoding-japanese`) directly from a CDN via a script tag: | ||
```html | ||
<script src="https://unpkg.com/encoding-japanese@2.1.0/encoding.min.js"></script> | ||
``` | ||
In this example we use [unpkg](https://unpkg.com/encoding-japanese/), but you can use any CDN that provides npm packages, | ||
for example [cdnjs](https://cdnjs.com/libraries/encoding-japanese) or [jsDelivr](https://www.jsdelivr.com/package/npm/encoding-japanese). | ||
## Supported encodings | ||
|Value in encoding.js|[`detect()`](#detect-character-encoding-detect)|[`convert()`](#convert-character-encoding-convert)|MIME Name (Note)| | ||
|Value in encoding.js|[`detect()`](#encodingdetect-data-encodings)|[`convert()`](#encodingconvert-data-to-from)|MIME Name (Note)| | ||
|:------:|:----:|:-----:|:---| | ||
|ASCII |✓ | |US-ASCII (Code point range: `0-127`)| | ||
|BINARY |✓ | |(Binary strings. Code point range: `0-255`)| | ||
|EUCJP |✓ |✓ |EUC-JP| | ||
|JIS |✓ |✓ |ISO-2022-JP| | ||
|SJIS |✓ |✓ |Shift_JIS| | ||
|UTF8 |✓ |✓ |UTF-8| | ||
|UTF16 |✓ |✓ |UTF-16| | ||
|UTF16BE |✓ |✓ |UTF-16BE (big-endian)| | ||
|UTF16LE |✓ |✓ |UTF-16LE (little-endian)| | ||
|UTF32 |✓ | |UTF-32| | ||
|UNICODE |✓ |✓ |(JavaScript's internal encoding. *See [About `UNICODE`](#about-unicode) below) | | ||
|ASCII |✓ | |US-ASCII (Code point range: `0-127`)| | ||
|BINARY |✓ | |(Binary string. Code point range: `0-255`)| | ||
|EUCJP |✓ |✓ |EUC-JP| | ||
|JIS |✓ |✓ |ISO-2022-JP| | ||
|SJIS |✓ |✓ |Shift_JIS| | ||
|UTF8 |✓ |✓ |UTF-8| | ||
|UTF16 |✓ |✓ |UTF-16| | ||
|UTF16BE |✓ |✓ |UTF-16BE (big-endian)| | ||
|UTF16LE |✓ |✓ |UTF-16LE (little-endian)| | ||
|UTF32 |✓ | |UTF-32| | ||
|UNICODE |✓ |✓ |(JavaScript string. *See [About `UNICODE`](#about-unicode) below) | | ||
### About `UNICODE` | ||
In encoding.js, the internal character encoding that can be handled in JavaScript is defined as `UNICODE`. | ||
In encoding.js, `UNICODE` is defined as the internal character encoding that JavaScript strings (JavaScript string objects) can handle directly. | ||
As mentioned above ([Features](#features)), JavaScript strings are internally encoded in UTF-16 code units, and other character encodings cannot be handled properly. | ||
Therefore, to convert to a character encoding properly represented in JavaScript, specify `UNICODE`. | ||
As mentioned in the [Features](#features) section, JavaScript strings are internally encoded using UTF-16 code units. | ||
This means that other character encodings cannot be directly handled without conversion. | ||
Therefore, when converting to a character encoding that is properly representable in JavaScript, you should specify `UNICODE`. | ||
(*Even if the HTML file encoding is UTF-8, specify `UNICODE` instead of `UTF8` when handling it in JavaScript.) | ||
(Note: Even if the HTML file's encoding is UTF-8, you should specify `UNICODE` instead of `UTF8` when processing the encoding in JavaScript.) | ||
The value of each character code array returned from `Encoding.convert` is a number of 0-255 if you specify a character code other than `UNICODE` such as `UTF8` or `SJIS`, | ||
or a number of `0-65535` (range of `String.prototype.charCodeAt()` values = Code Unit) if you specify `UNICODE`. | ||
When using [`Encoding.convert`](#encodingconvert-data-to-from), if you specify a character encoding other than `UNICODE` (such as `UTF8` or `SJIS`), the values in the returned character code array will range from `0-255`. | ||
However, if you specify `UNICODE`, the values will range from `0-65535`, which corresponds to the range of values returned by `String.prototype.charCodeAt()` (Code Units). | ||
@@ -165,11 +180,11 @@ ## Example usage | ||
```javascript | ||
var sjisArray = [ | ||
const sjisArray = [ | ||
130, 177, 130, 241, 130, 201, 130, 191, 130, 205 | ||
]; // 'こんにちは' array in SJIS | ||
var unicodeArray = Encoding.convert(sjisArray, { | ||
const unicodeArray = Encoding.convert(sjisArray, { | ||
to: 'UNICODE', | ||
from: 'SJIS' | ||
}); | ||
var str = Encoding.codeToString(unicodeArray); // Convert code array to string | ||
const str = Encoding.codeToString(unicodeArray); // Convert code array to string | ||
console.log(str); // 'こんにちは' | ||
@@ -181,8 +196,8 @@ ``` | ||
```javascript | ||
var data = [ | ||
const data = [ | ||
227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175 | ||
]; // 'こんにちは' array in UTF-8 | ||
var detectedEncoding = Encoding.detect(data); | ||
console.log('Character encoding is ' + detectedEncoding); // 'Character encoding is UTF8' | ||
const detectedEncoding = Encoding.detect(data); | ||
console.log(`Character encoding is ${detectedEncoding}`); // 'Character encoding is UTF8' | ||
``` | ||
@@ -206,4 +221,4 @@ | ||
* [Test for character encoding conversion (Demo)](http://polygonplanet.github.io/encoding.js/tests/encoding-test.html) | ||
* [Detect and Convert encoding from file (Demo)](http://polygonplanet.github.io/encoding.js/tests/detect-file-encoding.html) | ||
* [Test for character encoding conversion (Demo)](https://polygonplanet.github.io/encoding.js/tests/encoding-test.html) | ||
* [Detect and Convert encoding from file (Demo)](https://polygonplanet.github.io/encoding.js/tests/detect-file-encoding.html) | ||
@@ -214,67 +229,107 @@ ---- | ||
* [detect](#detect-character-encoding-detect) | ||
* [convert](#convert-character-encoding-convert) | ||
* [urlEncode / urlDecode](#url-encodedecode) | ||
* [base64Encode / base64Decode](#base64-encodedecode) | ||
* [codeToString / stringToCode](#code-array-to-string-conversion-codetostringstringtocode) | ||
* [Japanese Zenkaku / Hankaku conversion](#japanese-zenkakuhankaku-conversion) | ||
* [detect](#encodingdetect-data-encodings) | ||
* [convert](#encodingconvert-data-to-from) | ||
* [urlEncode](#encodingurlencode-data) | ||
* [urlDecode](#encodingurldecode-string) | ||
* [base64Encode](#encodingbase64encode-data) | ||
* [base64Decode](#encodingbase64decode-string) | ||
* [codeToString](#encodingcodetostring-code) | ||
* [stringToCode](#encodingstringtocode-string) | ||
* [Japanese Zenkaku/Hankaku conversion](#japanese-zenkakuhankaku-conversion) | ||
### Detect character encoding (detect) | ||
---- | ||
* {_string|boolean_} Encoding.**detect** ( data [, encodings ] ) | ||
Detect character encoding. | ||
@param {_Array|TypedArray|string_} _data_ Target data | ||
@param {_string|Array_} [_encodings_] (Optional) The encoding name that to specify the detection (value of [Supported encodings](#supported-encodings)) | ||
@return {_string|boolean_} Return the detected character encoding, or false. | ||
### Encoding.detect (data, [encodings]) | ||
The return value is one of the above "[Supported encodings](#supported-encodings)" or false if it cannot be detected. | ||
Detects the character encoding of the given data. | ||
#### Parameters | ||
* **data** *(Array\<number\>|TypedArray|Buffer|string)* : The code array or string to detect character encoding. | ||
* **\[encodings\]** *(string|Array\<string\>|Object)* : (Optional) Specifies a specific character encoding, | ||
or an array of encodings to limit the detection. Detects automatically if this argument is omitted or `AUTO` is specified. | ||
Supported encoding values can be found in the "[Supported encodings](#supported-encodings)" section. | ||
#### Return value | ||
*(string|boolean)*: Returns a string representing the detected encoding (e.g., `SJIS`, `UTF8`) listed in the "[Supported encodings](#supported-encodings)" section, or `false` if the encoding cannot be detected. | ||
If the `encodings` argument is provided, it returns the name of the detected encoding if the `data` matches any of the specified encodings, or `false` otherwise. | ||
#### Examples | ||
Example of detecting character encoding. | ||
```javascript | ||
var sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
var detectedEncoding = Encoding.detect(sjisArray); | ||
console.log('Encoding is ' + detectedEncoding); // 'Encoding is SJIS' | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const detectedEncoding = Encoding.detect(sjisArray); | ||
console.log(`Encoding is ${detectedEncoding}`); // 'Encoding is SJIS' | ||
``` | ||
Example of specifying the character encoding to be detected. | ||
If the second argument `encodings` is specified, returns true when it is the specified character encoding, false otherwise. | ||
Example of using the `encodings` argument to specify the character encoding to be detected. | ||
This returns a string detected encoding if the specified encoding matches, or `false` otherwise: | ||
```javascript | ||
var sjisArray = [130, 168, 130, 205, 130, 230]; | ||
var isSJIS = Encoding.detect(sjisArray, 'SJIS'); | ||
if (isSJIS) { | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const detectedEncoding = Encoding.detect(sjisArray, 'SJIS'); | ||
if (detectedEncoding) { | ||
console.log('Encoding is SJIS'); | ||
} else { | ||
console.log('Encoding does not match SJIS'); | ||
} | ||
``` | ||
### Convert character encoding (convert) | ||
Example of specifying multiple encodings: | ||
* {_Array|TypedArray|string_} Encoding.**convert** ( data, to\_encoding [, from\_encoding ] ) | ||
Converts character encoding. | ||
@param {_Array|TypedArray|Buffer|string_} _data_ The target data. | ||
@param {_string|Object_} _to\_encoding_ The encoding name of conversion destination, or option to convert as an object. | ||
@param {_string|Array_} [_from\_encoding_] (Optional) The encoding name of the source or 'AUTO'. | ||
@return {_Array|TypedArray|string_} Return the converted array/string. | ||
```javascript | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const detectedEncoding = Encoding.detect(sjisArray, ['UTF8', 'SJIS']); | ||
if (detectedEncoding) { | ||
console.log(`Encoding is ${detectedEncoding}`); // 'Encoding is SJIS' | ||
} else { | ||
console.log('Encoding does not match UTF8 and SJIS'); | ||
} | ||
``` | ||
Example of converting a character code array to Shift_JIS from UTF-8. | ||
---- | ||
### Encoding.convert (data, to[, from]) | ||
Converts the character encoding of the given data. | ||
#### Parameters | ||
* **data** *(Array\<number\>|TypedArray|Buffer|string)* : The code array or string to convert character encoding. | ||
* **to** *(string|Object)* : The character encoding name of the conversion destination as a string, or conversion options as an object. | ||
* **\[from\]** *(string|Array\<string\>)* : (Optional) The character encoding name of the conversion source as a string, | ||
or an array of encoding names. Detects automatically if this argument is omitted or `AUTO` is specified. | ||
Supported encoding values can be found in the "[Supported encodings](#supported-encodings)" section. | ||
#### Return value | ||
*(Array\<number\>|TypedArray|string)* : Returns a numeric character code array of the converted character encoding if `data` is an array or a buffer, | ||
or returns the converted string if `data` is a string. | ||
#### Examples | ||
Example of converting a character code array to Shift_JIS from UTF-8: | ||
```javascript | ||
var utf8Array = [227, 129, 130]; // "あ" in UTF-8 | ||
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8'); | ||
console.log(sjisArray); // [130, 160] ("あ" in SJIS) | ||
const utf8Array = [227, 129, 130]; // 'あ' in UTF-8 | ||
const sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8'); | ||
console.log(sjisArray); // [130, 160] ('あ' in SJIS) | ||
``` | ||
TypedArray such as `Uint8Array`, and `Buffer` of Node.js can be converted in the same usage. | ||
TypedArray such as `Uint8Array`, and `Buffer` of Node.js can be converted in the same usage: | ||
```javascript | ||
var utf8Array = new Uint8Array([227, 129, 130]); | ||
Encoding.convert(utf8Array, 'SJIS', 'UTF8'); | ||
const utf8Array = new Uint8Array([227, 129, 130]); | ||
const sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8'); | ||
``` | ||
Converts character encoding by auto-detecting the encoding name of the source. | ||
Converts character encoding by auto-detecting the encoding name of the source: | ||
```javascript | ||
// The character encoding is automatically detected when the from_encoding argument is omitted | ||
var utf8Array = [227, 129, 130]; | ||
var sjisArray = Encoding.convert(utf8Array, 'SJIS'); | ||
// The character encoding is automatically detected when the argument `from` is omitted | ||
const utf8Array = [227, 129, 130]; | ||
let sjisArray = Encoding.convert(utf8Array, 'SJIS'); | ||
// Or explicitly specify 'AUTO' to auto-detecting | ||
@@ -284,10 +339,12 @@ sjisArray = Encoding.convert(utf8Array, 'SJIS', 'AUTO'); | ||
#### Specify conversion options to the argument `to_encoding` as an object | ||
#### Specify conversion options to the argument `to` as an object | ||
You can specify the second argument `to_encoding` as an object for improving readability. | ||
You can pass the second argument `to` as an object for improving readability. | ||
Also, the following options such as `type`, `fallback`, and `bom` must be specified with an object. | ||
```javascript | ||
var sjisArray = Encoding.convert(utf8Array, { | ||
to: 'SJIS', // to_encoding | ||
from: 'UTF8' // from_encoding | ||
const utf8Array = [227, 129, 130]; | ||
const sjisArray = Encoding.convert(utf8Array, { | ||
to: 'SJIS', | ||
from: 'UTF8' | ||
}); | ||
@@ -302,4 +359,4 @@ ``` | ||
```javascript | ||
var sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
var unicodeString = Encoding.convert(sjisArray, { | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const unicodeString = Encoding.convert(sjisArray, { | ||
to: 'UNICODE', | ||
@@ -312,23 +369,28 @@ from: 'SJIS', | ||
The following `type` options are supported | ||
The following `type` options are supported. | ||
* **string** : Return as a string | ||
* **arraybuffer** : Return as an ArrayBuffer (`Uint16Array`) | ||
* **array** : Return as an Array (*default*) | ||
* **string** : Return as a string. | ||
* **arraybuffer** : Return as an ArrayBuffer (Actually returns a `Uint16Array` due to historical reasons). | ||
* **array** : Return as an Array. (*default*) | ||
#### Replace to HTML entity (Numeric character reference) when cannot be represented | ||
`type: 'string'` can be used as a shorthand for converting a code array to a string, | ||
as performed by [`Encoding.codeToString`](#encodingcodetostring-code). | ||
Note: Specifying `type: 'string'` may not handle conversions properly, except when converting to `UNICODE`. | ||
Characters that cannot be represented in the target character set are replaced with '?' (U+003F) by default but can be replaced with HTML entities by specifying the `fallback` option. | ||
#### Replacing characters with HTML entities when they cannot be represented | ||
Characters that cannot be represented in the target character set are replaced with '?' (U+003F) by default, | ||
but by specifying the `fallback` option, you can replace them with HTML entities (Numeric character references), such as `🍣`. | ||
The `fallback` option supports the following values. | ||
* **html-entity** : Replace to HTML entity (decimal HTML numeric character reference) | ||
* **html-entity-hex** : Replace to HTML entity (hexadecimal HTML numeric character reference) | ||
* **html-entity** : Replace to HTML entity (decimal HTML numeric character reference). | ||
* **html-entity-hex** : Replace to HTML entity (hexadecimal HTML numeric character reference). | ||
Example of specifying `{ fallback: 'html-entity' }` option | ||
Example of specifying `{ fallback: 'html-entity' }` option: | ||
```javascript | ||
var unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺'); | ||
const unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺'); | ||
// No fallback specified | ||
var sjisArray = Encoding.convert(unicodeArray, { | ||
let sjisArray = Encoding.convert(unicodeArray, { | ||
to: 'SJIS', | ||
@@ -348,7 +410,7 @@ from: 'UNICODE' | ||
Example of specifying `{ fallback: 'html-entity-hex' }` option | ||
Example of specifying `{ fallback: 'html-entity-hex' }` option: | ||
```javascript | ||
var unicodeArray = Encoding.stringToCode('ホッケの漢字は𩸽'); | ||
var sjisArray = Encoding.convert(unicodeArray, { | ||
const unicodeArray = Encoding.stringToCode('ホッケの漢字は𩸽'); | ||
const sjisArray = Encoding.convert(unicodeArray, { | ||
to: 'SJIS', | ||
@@ -367,3 +429,3 @@ from: 'UNICODE', | ||
```javascript | ||
var utf16Array = Encoding.convert(utf8Array, { | ||
const utf16Array = Encoding.convert(utf8Array, { | ||
to: 'UTF16', // to_encoding | ||
@@ -379,3 +441,3 @@ from: 'UTF8', // from_encoding | ||
```javascript | ||
var utf16leArray = Encoding.convert(utf8Array, { | ||
const utf16leArray = Encoding.convert(utf8Array, { | ||
to: 'UTF16', // to_encoding | ||
@@ -391,3 +453,3 @@ from: 'UTF8', // from_encoding | ||
```javascript | ||
var utf16beArray = Encoding.convert(utf8Array, { | ||
const utf16beArray = Encoding.convert(utf8Array, { | ||
to: 'UTF16BE', | ||
@@ -398,98 +460,271 @@ from: 'UTF8' | ||
### URL Encode/Decode | ||
---- | ||
* {_string_} Encoding.**urlEncode** ( data ) | ||
URL(percent) encode. | ||
@param {_Array_|_TypedArray_} _data_ Target data. | ||
@return {_string_} Return the encoded string. | ||
### Encoding.urlEncode (data) | ||
* {_Array_} Encoding.**urlDecode** ( string ) | ||
URL(percent) decode. | ||
@param {_string_} _string_ Target data. | ||
@return {_Array_} Return the decoded array. | ||
Encodes a numeric character code array into a percent-encoded string formatted as a URI component in `%xx` format. | ||
urlEncode escapes all characters except the following, just like [`encodeURIComponent()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent). | ||
``` | ||
A-Z a-z 0-9 - _ . ! ~ * ' ( ) | ||
``` | ||
#### Parameters | ||
* **data** *(Array\<number\>|TypedArray|Buffer|string)* : The numeric character code array or string that will be encoded into a percent-encoded URI component. | ||
#### Return value | ||
*(string)* : Returns a percent-encoded string formatted as a URI component in `%xx` format. | ||
#### Examples | ||
Example of URL encoding a Shift_JIS array: | ||
```javascript | ||
var sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]; | ||
var encoded = Encoding.urlEncode(sjisArray); | ||
console.log(encoded); // '%82%B1%82%F1%82%C9%82%BF%82%CD' | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const encoded = Encoding.urlEncode(sjisArray); | ||
console.log(encoded); // '%82%A8%82%CD%82%E6' | ||
``` | ||
var decoded = Encoding.urlDecode(encoded); | ||
console.log(decoded); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205] | ||
---- | ||
### Encoding.urlDecode (string) | ||
Decodes a percent-encoded string formatted as a URI component in `%xx` format to a numeric character code array. | ||
#### Parameters | ||
* **string** *(string)* : The string to decode. | ||
#### Return value | ||
*(Array\<number\>)* : Returns a numeric character code array. | ||
#### Examples | ||
Example of decoding a percent-encoded Shift_JIS string: | ||
```javascript | ||
const encoded = '%82%A8%82%CD%82%E6'; // 'おはよ' encoded as percent-encoded SJIS string | ||
const sjisArray = Encoding.urlDecode(encoded); | ||
console.log(sjisArray); // [130, 168, 130, 205, 130, 230] | ||
``` | ||
### Base64 Encode/Decode | ||
---- | ||
* {_string_} Encoding.**base64Encode** ( data ) | ||
Base64 encode. | ||
@param {_Array_|_TypedArray_} _data_ Target data. | ||
@return {_string_} Return the Base64 encoded string. | ||
### Encoding.base64Encode (data) | ||
* {_Array_} Encoding.**base64Decode** ( string ) | ||
Base64 decode. | ||
@param {_string_} _string_ Target data. | ||
@return {_Array_} Return the Base64 decoded array. | ||
Encodes a numeric character code array into a Base64 encoded string. | ||
#### Parameters | ||
* **data** *(Array\<number\>|TypedArray|Buffer|string)* : The numeric character code array or string to encode. | ||
#### Return value | ||
*(string)* : Returns a Base64 encoded string. | ||
#### Examples | ||
Example of Base64 encoding a Shift_JIS array: | ||
```javascript | ||
var sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]; | ||
var encoded = Encoding.base64Encode(sjisArray); | ||
console.log(encoded); // 'grGC8YLJgr+CzQ==' | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const encodedStr = Encoding.base64Encode(sjisArray); | ||
console.log(encodedStr); // 'gqiCzYLm' | ||
``` | ||
var decoded = Encoding.base64Decode(encoded); | ||
console.log(decoded); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205] | ||
---- | ||
### Encoding.base64Decode (string) | ||
Decodes a Base64 encoded string to a numeric character code array. | ||
#### Parameters | ||
* **string** *(string)* : The Base64 encoded string to decode. | ||
#### Return value | ||
*(Array\<number\>)* : Returns a Base64 decoded numeric character code array. | ||
#### Examples | ||
Example of `base64Encode` and `base64Decode`: | ||
```javascript | ||
const sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]; // 'こんにちは' array in SJIS | ||
const encodedStr = Encoding.base64Encode(sjisArray); | ||
console.log(encodedStr); // 'grGC8YLJgr+CzQ==' | ||
const decodedArray = Encoding.base64Decode(encodedStr); | ||
console.log(decodedArray); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205] | ||
``` | ||
### Code array to string conversion (codeToString/stringToCode) | ||
---- | ||
* {_string_} Encoding.**codeToString** ( {_Array_|_TypedArray_} data ) | ||
Joins a character code array to string. | ||
### Encoding.codeToString (code) | ||
* {_Array_} Encoding.**stringToCode** ( {_string_} string ) | ||
Splits string to an array of character codes. | ||
Converts a numeric character code array to string. | ||
#### Parameters | ||
* **code** *(Array\<number\>|TypedArray|Buffer)* : The numeric character code array to convert. | ||
#### Return value | ||
*(string)* : Returns a converted string. | ||
#### Examples | ||
Example of converting a character code array to a string: | ||
```javascript | ||
const sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS | ||
const unicodeArray = Encoding.convert(sjisArray, { | ||
to: 'UNICODE', | ||
from: 'SJIS' | ||
}); | ||
const unicodeStr = Encoding.codeToString(unicodeArray); | ||
console.log(unicodeStr); // 'おはよ' | ||
``` | ||
---- | ||
### Encoding.stringToCode (string) | ||
Converts a string to a numeric character code array. | ||
#### Parameters | ||
* **string** *(string)* : The string to convert. | ||
#### Return value | ||
*(Array\<number\>)* : Returns a numeric character code array converted from the string. | ||
#### Examples | ||
Example of converting a string to a character code array: | ||
```javascript | ||
const unicodeArray = Encoding.stringToCode('おはよ'); | ||
console.log(unicodeArray); // [12362, 12399, 12424] | ||
``` | ||
---- | ||
### Japanese Zenkaku/Hankaku conversion | ||
* {_Array|string_} Encoding.**toHankakuCase** ( {_Array|string_} data ) | ||
Convert the ascii symbols and alphanumeric characters to the zenkaku symbols and alphanumeric characters. | ||
The following methods convert Japanese full-width (zenkaku) and half-width (hankaku) characters, | ||
suitable for use with `UNICODE` strings or numeric character code arrays of `UNICODE`. | ||
* {_Array|string_} Encoding.**toZenkakuCase** ( {_Array|string_} data ) | ||
Convert to the zenkaku symbols and alphanumeric characters from the ascii symbols and alphanumeric characters. | ||
Returns a converted string if the argument `data` is a string. | ||
Returns a numeric character code array if the argument `data` is a code array. | ||
* {_Array|string_} Encoding.**toHiraganaCase** ( {_Array|string_} data ) | ||
Convert to the zenkaku hiragana from the zenkaku katakana. | ||
- **Encoding.toHankakuCase (data)** : Converts full-width (zenkaku) symbols and alphanumeric characters to their half-width (hankaku) equivalents. | ||
- **Encoding.toZenkakuCase (data)** : Converts half-width (hankaku) symbols and alphanumeric characters to their full-width (zenkaku) equivalents. | ||
- **Encoding.toHiraganaCase (data)** : Converts full-width katakana to full-width hiragana. | ||
- **Encoding.toKatakanaCase (data)** : Converts full-width hiragana to full-width katakana. | ||
- **Encoding.toHankanaCase (data)** : Converts full-width katakana to half-width katakana. | ||
- **Encoding.toZenkanaCase (data)** : Converts half-width katakana to full-width katakana. | ||
- **Encoding.toHankakuSpace (data)** : Converts the em space (U+3000) to the single space (U+0020). | ||
- **Encoding.toZenkakuSpace (data)** : Converts the single space (U+0020) to the em space (U+3000). | ||
* {_Array|string_} Encoding.**toKatakanaCase** ( {_Array|string_} data ) | ||
Convert to the zenkaku katakana from the zenkaku hiragana. | ||
#### Parameters | ||
* {_Array|string_} Encoding.**toHankanaCase** ( {_Array|string_} data ) | ||
Convert to the hankaku katakana from the zenkaku katakana. | ||
- **data** *(Array\<number\>|TypedArray|Buffer|string)* : The string or numeric character code array to convert. | ||
* {_Array|string_} Encoding.**toZenkanaCase** ( {_Array|string_} data ) | ||
Convert to the zenkaku katakana from the hankaku katakana. | ||
#### Return value | ||
* {_Array|string_} Encoding.**toHankakuSpace** ({_Array|string_} data ) | ||
Convert the em space(U+3000) to the single space(U+0020). | ||
*(Array\<number\>|string)* : Returns a converted string or numeric character code array. | ||
* {_Array|string_} Encoding.**toZenkakuSpace** ( {_Array|string_} data ) | ||
Convert the single space(U+0020) to the em space(U+3000). | ||
#### Examples | ||
Example of converting zenkaku and hankaku strings: | ||
```javascript | ||
console.log(Encoding.toHankakuCase('abcDEF123@!#*=')); // 'abcDEF123@!#*=' | ||
console.log(Encoding.toZenkakuCase('abcDEF123@!#*=')); // 'abcDEF123@!#*=' | ||
console.log(Encoding.toHiraganaCase('アイウエオァィゥェォヴボポ')); // 'あいうえおぁぃぅぇぉゔぼぽ' | ||
console.log(Encoding.toKatakanaCase('あいうえおぁぃぅぇぉゔぼぽ')); // 'アイウエオァィゥェォヴボポ' | ||
console.log(Encoding.toHankanaCase('アイウエオァィゥェォヴボポ')); // 'アイウエオァィゥェォヴボポ' | ||
console.log(Encoding.toZenkanaCase('アイウエオァィゥェォヴボポ')); // 'アイウエオァィゥェォヴボポ' | ||
console.log(Encoding.toHankakuSpace('あいうえお abc 123')); // 'あいうえお abc 123' | ||
console.log(Encoding.toZenkakuSpace('あいうえお abc 123')); // 'あいうえお abc 123' | ||
``` | ||
Example of converting zenkaku and hankaku code arrays: | ||
```javascript | ||
const unicodeArray = Encoding.stringToCode('abc123!# あいうアイウ ABCアイウ'); | ||
console.log(Encoding.codeToString(Encoding.toHankakuCase(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toZenkakuCase(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toHiraganaCase(unicodeArray))); | ||
// 'abc123!# あいうあいう ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toKatakanaCase(unicodeArray))); | ||
// 'abc123!# アイウアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toHankanaCase(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toZenkanaCase(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toHankakuSpace(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
console.log(Encoding.codeToString(Encoding.toZenkakuSpace(unicodeArray))); | ||
// 'abc123!# あいうアイウ ABCアイウ' | ||
``` | ||
---- | ||
## Other examples | ||
### Example using the XMLHttpRequest and Typed arrays (Uint8Array) | ||
### Example using the `Fetch API` and Typed Arrays (Uint8Array) | ||
This sample reads the text file written in Shift_JIS as binary data, | ||
and displays a string that is converted to Unicode by Encoding.convert. | ||
This example reads a text file encoded in Shift_JIS as binary data, | ||
and displays it as a string after converting it to Unicode using [Encoding.convert](#encodingconvert-data-to-from). | ||
```javascript | ||
var req = new XMLHttpRequest(); | ||
req.open('GET', '/my-shift_jis.txt', true); | ||
(async () => { | ||
try { | ||
const response = await fetch('shift_jis.txt'); | ||
const buffer = await response.arrayBuffer(); | ||
// Code array with Shift_JIS file contents | ||
const sjisArray = new Uint8Array(buffer); | ||
// Convert encoding to UNICODE (JavaScript Code Units) from Shift_JIS | ||
const unicodeArray = Encoding.convert(sjisArray, { | ||
to: 'UNICODE', | ||
from: 'SJIS' | ||
}); | ||
// Convert to string from code array for display | ||
const unicodeString = Encoding.codeToString(unicodeArray); | ||
console.log(unicodeString); | ||
} catch (error) { | ||
console.error('Error loading the file:', error); | ||
} | ||
})(); | ||
``` | ||
<details> | ||
<summary>XMLHttpRequest version of this example</summary> | ||
```javascript | ||
const req = new XMLHttpRequest(); | ||
req.open('GET', 'shift_jis.txt', true); | ||
req.responseType = 'arraybuffer'; | ||
req.onload = function (event) { | ||
var buffer = req.response; | ||
req.onload = (event) => { | ||
const buffer = req.response; | ||
if (buffer) { | ||
// Shift_JIS Array | ||
var sjisArray = new Uint8Array(buffer); | ||
// Code array with Shift_JIS file contents | ||
const sjisArray = new Uint8Array(buffer); | ||
// Convert encoding to UNICODE (JavaScript Unicode Array). | ||
var unicodeArray = Encoding.convert(sjisArray, { | ||
// Convert encoding to UNICODE (JavaScript Code Units) from Shift_JIS | ||
const unicodeArray = Encoding.convert(sjisArray, { | ||
to: 'UNICODE', | ||
@@ -499,4 +734,4 @@ from: 'SJIS' | ||
// Join to string. | ||
var unicodeString = Encoding.codeToString(unicodeArray); | ||
// Convert to string from code array for display | ||
const unicodeString = Encoding.codeToString(unicodeArray); | ||
console.log(unicodeString); | ||
@@ -508,7 +743,9 @@ } | ||
``` | ||
</details> | ||
### Convert encoding for file using the File APIs | ||
Reads file using the File APIs. | ||
Detect file encoding and convert to Unicode, and display it. | ||
This example uses the File API to read the content of a selected file, detects its character encoding, | ||
and converts the file content to UNICODE from any character encoding such as `Shift_JIS` or `EUC-JP`. | ||
The converted content is then displayed in a textarea. | ||
@@ -518,21 +755,23 @@ ```html | ||
<div id="encoding"></div> | ||
<textarea id="result" rows="5" cols="80"></textarea> | ||
<textarea id="content" rows="5" cols="80"></textarea> | ||
<script> | ||
function onFileSelect(event) { | ||
var file = event.target.files[0]; | ||
const file = event.target.files[0]; | ||
var reader = new FileReader(); | ||
const reader = new FileReader(); | ||
reader.onload = function(e) { | ||
var codes = new Uint8Array(e.target.result); | ||
var encoding = Encoding.detect(codes); | ||
document.getElementById('encoding').textContent = encoding; | ||
const codes = new Uint8Array(e.target.result); | ||
// Convert encoding to unicode | ||
var unicodeString = Encoding.convert(codes, { | ||
to: 'unicode', | ||
from: encoding, | ||
const detectedEncoding = Encoding.detect(codes); | ||
const encoding = document.getElementById('encoding'); | ||
encoding.textContent = `Detected encoding: ${detectedEncoding}`; | ||
// Convert encoding to UNICODE | ||
const unicodeString = Encoding.convert(codes, { | ||
to: 'UNICODE', | ||
from: detectedEncoding, | ||
type: 'string' | ||
}); | ||
document.getElementById('result').value = unicodeString; | ||
document.getElementById('content').value = unicodeString; | ||
}; | ||
@@ -543,7 +782,7 @@ | ||
document.getElementById('file').addEventListener('change', onFileSelect, false); | ||
document.getElementById('file').addEventListener('change', onFileSelect); | ||
</script> | ||
``` | ||
[**Demo**](http://polygonplanet.github.io/encoding.js/tests/detect-file-encoding.html) | ||
[**Demo**](https://polygonplanet.github.io/encoding.js/tests/detect-file-encoding.html) | ||
@@ -557,10 +796,8 @@ ## Contributing | ||
Please run `$ npm run test` before the pull request to confirm there are no errors. | ||
We only accept requests without errors. | ||
Before submitting a pull request, please run `npm run test` to ensure there are no errors. | ||
We only accept pull requests that pass all tests. | ||
## License | ||
MIT | ||
This project is licensed under the terms of the MIT license. | ||
See the [LICENSE](LICENSE) file for details. |
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is too big to display
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
780
851860
17
11314