Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

encoding-japanese

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

encoding-japanese

Convert or detect character encoding in JavaScript

2.0.0
Source
npm

Version published: 3 years ago

Maintainers: 1

Created: 10 years ago

What is encoding-japanese?

The encoding-japanese npm package provides functionalities for encoding and decoding Japanese text. It supports various character encodings such as UTF-8, Shift_JIS, EUC-JP, and ISO-2022-JP. The package is useful for converting between different encodings, detecting encoding types, and handling Japanese text in web applications.

What are encoding-japanese's main functionalities?

Encoding Conversion

This feature allows you to convert text from one encoding to another. In this example, the Japanese text 'こんにちは' is converted to a UTF-8 byte array.

const Encoding = require('encoding-japanese');
const utf8Array = Encoding.convert('こんにちは', 'UTF8');
console.log(utf8Array);

Encoding Detection

This feature detects the encoding of a given byte array. In this example, the byte array is detected to determine its encoding.

const Encoding = require('encoding-japanese');
const detectedEncoding = Encoding.detect(new Uint8Array([0x82, 0xa0, 0x82, 0xa2, 0x82, 0xa4]));
console.log(detectedEncoding);

String to Byte Array

This feature converts a string to a byte array. In this example, the Japanese text 'こんにちは' is converted to a byte array.

const Encoding = require('encoding-japanese');
const byteArray = Encoding.stringToCode('こんにちは');
console.log(byteArray);

Byte Array to String

This feature converts a byte array back to a string. In this example, a byte array is converted back to the Japanese text 'こんにちは'.

const Encoding = require('encoding-japanese');
const byteArray = [0x82, 0xa0, 0x82, 0xa2, 0x82, 0xa4];
const string = Encoding.codeToString(byteArray);
console.log(string);

Other packages similar to encoding-japanese

encoding.js

Convert or detect character encoding in JavaScript.

README (Japanese)

Features

encoding.js is a JavaScript library for converting and detecting character encodings that support Japanese character encodings such as Shift_JIS, EUC-JP, JIS, and Unicode such as UTF-8 and UTF-16.

Since JavaScript string values are internally encoded as UTF-16 code units (ref: ECMAScript® 2019 Language Specification - 6.1.4 The String Type), they cannot properly handle other character encodings as they are, but encoding.js enables conversion by handling them as arrays instead of strings.

Each character encoding is handled as an array of numbers with character code values, for example [130, 160] ("あ" in UTF-8).

The array of character codes passed to each method of encoding.js can also be used with TypedArray such as Uint8Array, and Buffer in Node.js.

How to use character encoding in strings?

Numeric arrays of character codes can be converted to strings with methods such as Encoding.codeToString , but because of the above JavaScript specifications, some character encodings cannot be handled properly when converted to strings.

So if you want to use strings instead of arrays, convert it to percent-encoded strings like '%82%A0' by using Encoding.urlEncode and Encoding.urlDecode to passed to other resources. Or, Encoding.base64Encode and Encoding.base64Decode can be passed as strings in the same way.

Installation

npm

encoding.js is published under the package name encoding-japanese on npm.

$ npm install --save encoding-japanese

using `import`

import Encoding from 'encoding-japanese';

using `require`

const Encoding = require('encoding-japanese');

TypeScript

TypeScript type definitions for encoding.js are available at @types/encoding-japanese (thanks @rhysd).

$ npm install --save-dev @types/encoding-japanese

browser (standalone)

Install from npm or download from the release list and use encoding.js or encoding.min.js in the package.
*Please note that if you git clone, even the master branch may be under development.

<script src="encoding.js"></script>

Or use the minified encoding.min.js

<script src="encoding.min.js"></script>

When the script is loaded, the object Encoding is defined in the global scope (ie window.Encoding).

CDN

You can use the encoding.js (package name: encoding-japanese) CDN on cdnjs.com.

Supported encodings

Value in encoding.js	`detect()`	`convert()`	MIME Name (Note)
ASCII	✓		US-ASCII (Code point range: `0-127`)
BINARY	✓		(Binary strings. Code point range: `0-255`)
EUCJP	✓	✓	EUC-JP
JIS	✓	✓	ISO-2022-JP
SJIS	✓	✓	Shift_JIS
UTF8	✓	✓	UTF-8
UTF16	✓	✓	UTF-16
UTF16BE	✓	✓	UTF-16BE (big-endian)
UTF16LE	✓	✓	UTF-16LE (little-endian)
UTF32	✓		UTF-32
UNICODE	✓	✓	(JavaScript's internal encoding. *See About `UNICODE` below)

About `UNICODE`

In encoding.js, the internal character encoding that can be handled in JavaScript is defined as UNICODE.

As mentioned above (Features), JavaScript strings are internally encoded in UTF-16 code units, and other character encodings cannot be handled properly. Therefore, to convert to a character encoding properly represented in JavaScript, specify UNICODE.

(*Even if the HTML file encoding is UTF-8, specify UNICODE instead of UTF8 when handling it in JavaScript.)

The value of each character code array returned from Encoding.convert is a number of 0-255 if you specify a character code other than UNICODE such as UTF8 or SJIS, or a number of 0-65535 (range of String.prototype.charCodeAt() values = Code Unit) if you specify UNICODE.

Example usage

Convert character encoding from JavaScript string (UNICODE) to SJIS.

const unicodeArray = Encoding.stringToCode('こんにちは'); // Convert string to code array
const sjisArray = Encoding.convert(unicodeArray, {
  to: 'SJIS',
  from: 'UNICODE'
});
console.log(sjisArray);
// [130, 177, 130, 241, 130, 201, 130, 191, 130, 205] ('こんにちは' array in SJIS)

Convert character encoding from SJIS to UNICODE.

var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205
]; // 'こんにちは' array in SJIS

var unicodeArray = Encoding.convert(sjisArray, {
  to: 'UNICODE',
  from: 'SJIS'
});
var str = Encoding.codeToString(unicodeArray); // Convert code array to string
console.log(str); // 'こんにちは'

Detect character encoding.

var data = [
  227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175
]; // 'こんにちは' array in UTF-8

var detectedEncoding = Encoding.detect(data);
console.log('Character encoding is ' + detectedEncoding); // 'Character encoding is UTF8'

(Node.js) Example of reading a text file written in SJIS.

const fs = require('fs');
const Encoding = require('encoding-japanese');

const sjisBuffer = fs.readFileSync('./sjis.txt');
const unicodeArray = Encoding.convert(sjisBuffer, {
  to: 'UNICODE',
  from: 'SJIS'
});
console.log(Encoding.codeToString(unicodeArray));

Demo

API

Detect character encoding (detect)

{string|boolean} Encoding.detect ( data [, encodings ] )
Detect character encoding.
@param {Array|TypedArray|string} data Target data
@param {string|Array} [encodings] (Optional) The encoding name that to specify the detection (value of Supported encodings)
@return {string|boolean} Return the detected character encoding, or false.

The return value is one of the above "Supported encodings" or false if it cannot be detected.

var sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS
var detectedEncoding = Encoding.detect(sjisArray);
console.log('Encoding is ' + detectedEncoding); // 'Encoding is SJIS'

Example of specifying the character encoding to be detected. If the second argument encodings is specified, returns true when it is the specified character encoding, false otherwise.

var sjisArray = [130, 168, 130, 205, 130, 230];
var isSJIS = Encoding.detect(sjisArray, 'SJIS');
if (isSJIS) {
  console.log('Encoding is SJIS');
}

Convert character encoding (convert)

{Array|TypedArray|string} Encoding.convert ( data, to_encoding [, from_encoding ] )
Converts character encoding.
@param {Array|TypedArray|Buffer|string} data The target data.
@param {string|Object} to_encoding The encoding name of conversion destination, or option to convert as an object.
@param {string|Array} [from_encoding] (Optional) The encoding name of the source or 'AUTO'.
@return {Array|TypedArray|string} Return the converted array/string.

Example of converting a character code array to Shift_JIS from UTF-8.

var utf8Array = [227, 129, 130]; // "あ" in UTF-8
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8');
console.log(sjisArray); // [130, 160] ("あ" in SJIS)

TypedArray such as Uint8Array, and Buffer of Node.js can be converted in the same usage.

var utf8Array = new Uint8Array([227, 129, 130]);
Encoding.convert(utf8Array, 'SJIS', 'UTF8');

Converts character encoding by auto-detecting the encoding name of the source.

// The character encoding is automatically detected when the from_encoding argument is omitted
var utf8Array = [227, 129, 130];
var sjisArray = Encoding.convert(utf8Array, 'SJIS');

// Or explicitly specify 'AUTO' to auto-detecting
sjisArray = Encoding.convert(utf8Array, 'SJIS', 'AUTO');

Specify conversion options to the argument `to_encoding` as an object

You can specify the second argument to_encoding as an object for improving readability.

var sjisArray = Encoding.convert(utf8Array, {
  to: 'SJIS', // to_encoding
  from: 'UTF8' // from_encoding
});

Specify the return type by the `type` option

convert returns an array by default, but you can change the return type by specifying the type option. Also, if the argument data is passed as a string and the type option is not specified, then type ='string' is assumed (returns as a string).

var sjisArray = [130, 168, 130, 205, 130, 230]; // 'おはよ' array in SJIS
var unicodeString = Encoding.convert(sjisArray, {
  to: 'UNICODE',
  from: 'SJIS',
  type: 'string' // Specify 'string' to return as string
});
console.log(unicodeString); // 'おはよ'

The following type options are supported

string : Return as a string
arraybuffer : Return as an ArrayBuffer (Uint16Array)
array : Return as an Array (default)

Replace to HTML entity (Numeric character reference) when cannot be represented

Characters that cannot be represented in the target character set are replaced with '?' (U+003F) by default but can be replaced with HTML entities by specifying the fallback option.

The fallback option supports the following values.

html-entity : Replace to HTML entity (decimal HTML numeric character reference)
html-entity-hex : Replace to HTML entity (hexadecimal HTML numeric character reference)

Example of specifying { fallback: 'html-entity' } option

var unicodeArray = Encoding.stringToCode('寿司🍣ビール🍺');
// No fallback specified
var sjisArray = Encoding.convert(unicodeArray, {
  to: 'SJIS',
  from: 'UNICODE'
});
console.log(sjisArray); // Converted to a code array of '寿司?ビール?'

// Specify `fallback: html-entity`
sjisArray = Encoding.convert(unicodeArray, {
  to: 'SJIS',
  from: 'UNICODE',
  fallback: 'html-entity'
});
console.log(sjisArray); // Converted to a code array of '寿司&#127843;ビール&#127866;'

Example of specifying { fallback: 'html-entity-hex' } option

var unicodeArray = Encoding.stringToCode('ホッケの漢字は𩸽');
var sjisArray = Encoding.convert(unicodeArray, {
  to: 'SJIS',
  from: 'UNICODE',
  fallback: 'html-entity-hex'
});
console.log(sjisArray); // Converted to a code array of 'ホッケの漢字は&#x29e3d;'

Specify BOM in UTF-16

You can add a BOM (byte order mark) by specifying the bom option when converting to UTF16. The default is no BOM.

var utf16Array = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: true // Add BOM
});

UTF16 byte order is big-endian by default. If you want to convert as little-endian, specify the { bom: 'LE' } option.

var utf16leArray = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: 'LE' // With BOM (little-endian)
});

If you do not need BOM, use UTF16BE or UTF16LE. UTF16BE is big-endian, and UTF16LE is little-endian, and both have no BOM.

var utf16beArray = Encoding.convert(utf8Array, {
  to: 'UTF16BE',
  from: 'UTF8'
});

URL Encode/Decode

{string} Encoding.urlEncode ( data )
URL(percent) encode.
@param {Array|TypedArray} data Target data.
@return {string} Return the encoded string.
{Array} Encoding.urlDecode ( string )
URL(percent) decode.
@param {string} string Target data.
@return {Array} Return the decoded array.

var sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205];
var encoded = Encoding.urlEncode(sjisArray);
console.log(encoded); // '%82%B1%82%F1%82%C9%82%BF%82%CD'

var decoded = Encoding.urlDecode(encoded);
console.log(decoded); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]

Base64 Encode/Decode

{string} Encoding.base64Encode ( data )
Base64 encode.
@param {Array|TypedArray} data Target data.
@return {string} Return the Base64 encoded string.
{Array} Encoding.base64Decode ( string )
Base64 decode.
@param {string} string Target data.
@return {Array} Return the Base64 decoded array.

var sjisArray = [130, 177, 130, 241, 130, 201, 130, 191, 130, 205];
var encoded = Encoding.base64Encode(sjisArray);
console.log(encoded); // 'grGC8YLJgr+CzQ=='

var decoded = Encoding.base64Decode(encoded);
console.log(decoded); // [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]

Code array to string conversion (codeToString/stringToCode)

{string} Encoding.codeToString ( {Array|TypedArray} data )
Joins a character code array to string.
{Array} Encoding.stringToCode ( {string} string )
Splits string to an array of character codes.

Japanese Zenkaku/Hankaku conversion

{Array|string} Encoding.toHankakuCase ( {Array|string} data )
Convert the ascii symbols and alphanumeric characters to the zenkaku symbols and alphanumeric characters.
{Array|string} Encoding.toZenkakuCase ( {Array|string} data )
Convert to the zenkaku symbols and alphanumeric characters from the ascii symbols and alphanumeric characters.
{Array|string} Encoding.toHiraganaCase ( {Array|string} data )
Convert to the zenkaku hiragana from the zenkaku katakana.
{Array|string} Encoding.toKatakanaCase ( {Array|string} data )
Convert to the zenkaku katakana from the zenkaku hiragana.
{Array|string} Encoding.toHankanaCase ( {Array|string} data )
Convert to the hankaku katakana from the zenkaku katakana.
{Array|string} Encoding.toZenkanaCase ( {Array|string} data )
Convert to the zenkaku katakana from the hankaku katakana.
{Array|string} Encoding.toHankakuSpace ({Array|string} data )
Convert the em space(U+3000) to the single space(U+0020).
{Array|string} Encoding.toZenkakuSpace ( {Array|string} data )
Convert the single space(U+0020) to the em space(U+3000).

Other examples

Example using the XMLHttpRequest and Typed arrays (Uint8Array)

This sample reads the text file written in Shift_JIS as binary data, and displays a string that is converted to Unicode by Encoding.convert.

var req = new XMLHttpRequest();
req.open('GET', '/my-shift_jis.txt', true);
req.responseType = 'arraybuffer';

req.onload = function (event) {
  var buffer = req.response;
  if (buffer) {
    // Shift_JIS Array
    var sjisArray = new Uint8Array(buffer);

    // Convert encoding to UNICODE (JavaScript Unicode Array).
    var unicodeArray = Encoding.convert(sjisArray, {
      to: 'UNICODE',
      from: 'SJIS'
    });

    // Join to string.
    var unicodeString = Encoding.codeToString(unicodeArray);
    console.log(unicodeString);
  }
};

req.send(null);

Convert encoding for file using the File APIs

Reads file using the File APIs.
Detect file encoding and convert to Unicode, and display it.

<input type="file" id="file">
<div id="encoding"></div>
<textarea id="result" rows="5" cols="80"></textarea>

<script>
function onFileSelect(event) {
  var file = event.target.files[0];

  var reader = new FileReader();
  reader.onload = function(e) {
    var codes = new Uint8Array(e.target.result);
    var encoding = Encoding.detect(codes);
    document.getElementById('encoding').textContent = encoding;

    // Convert encoding to unicode
    var unicodeString = Encoding.convert(codes, {
      to: 'unicode',
      from: encoding,
      type: 'string'
    });
    document.getElementById('result').value = unicodeString;
  };

  reader.readAsArrayBuffer(file);
}

document.getElementById('file').addEventListener('change', onFileSelect, false);
</script>

Demo

Contributing

We welcome contributions from everyone. For bug reports and feature requests, please create an issue on GitHub.

Pull requests

Please run $ npm run test before the pull request to confirm there are no errors. We only accept requests without errors.

License

MIT

2.0.0 (2022-03-29)

Features

Add fallback option to Encoding.convert. (5622bfa) Thanks #23 by @tohutohu, fallback types by @p-chan
Add Encoding.version. (bd3d6ef)

Bug Fixes

Fix deprecated Buffer constructor. (b8fda07)

Breaking Changes

Drop bower support. (981ea39)

Keywords

FAQs

What is encoding-japanese?

Is encoding-japanese well maintained?

Package last updated on 29 Mar 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

encoding-japanese

What is encoding-japanese?

What are encoding-japanese's main functionalities?

Other packages similar to encoding-japanese

iconv-lite

jconv

encoding.js

Table of contents

Features

How to use character encoding in strings?

Installation

npm

using import

using require

TypeScript

browser (standalone)

CDN

Supported encodings

About UNICODE

Example usage

Demo

API

Detect character encoding (detect)

Convert character encoding (convert)

Specify conversion options to the argument to_encoding as an object

Specify the return type by the type option

Replace to HTML entity (Numeric character reference) when cannot be represented

Specify BOM in UTF-16

URL Encode/Decode

Base64 Encode/Decode

Code array to string conversion (codeToString/stringToCode)

Japanese Zenkaku/Hankaku conversion

Other examples

Example using the XMLHttpRequest and Typed arrays (Uint8Array)

Convert encoding for file using the File APIs

Contributing

Pull requests

License

2.0.0 (2022-03-29)

Features

Bug Fixes

Breaking Changes

Keywords

Related posts

Malicious npm Package Exploits WhatsApp Authentication with Remote Kill Switch for File Destruction

PyPI Introduces Digital Attestations to Strengthen Python Package Security

GitHub Removes Malicious Pull Requests Targeting Open Source Repositories

using `import`

using `require`

About `UNICODE`

Specify conversion options to the argument `to_encoding` as an object

Specify the return type by the `type` option