CJK Length
Returns string length with wide characters counting as two
In CJK (Chinese, Japanese and Korean) text, "wide" or "fullwidth" characters
are Unicode glyphs that get printed as two blocks wide instead of one when using
a fixed-width font. Examples include ranges like the Japanese kana (あいうえお),
full-width romaji (ABCDE), and kanji/hanzi ideographs (一所懸命).
Since these characters are printed as two blocks, but count as one, this causes
a problem when trying to accurately measure the length of the string for use
in a fixed-width text environment such as the terminal—a string containing
one fullwidth character will visually appear to be one character longer than
its length value would indicate. This causes e.g. tabulated layouts to be broken.
This function scans a given string for occurrences of characters from the relevant
Unicode ranges to correctly determine the string's visual length.
For a full list of the character ranges used, see the characters.js
source.
Usage
To use, replace property accesses such as myString.length
with function calls
to cjkLength(myString)
:
const cjkLength = require('cjk-length').default
const myString = 'abcdeABCDE'
console.log(myString.length)
console.log(cjkLength(myString))
console.log(`.${myString}.`)
console.log(`.${'a'.repeat(myString.length)}.`)
console.log(`.${'a'.repeat(cjkLength(myString))}.`)
If you need to process a string's wide characters in some other way, you can import
the regular expression used to match them:
const { charsRegex } = require('cjk-length')
console.log(charsRegex instanceof RegExp)
Note: charsRegex
is a structured like new RegExp('[\u1100-\u11F9\u3000-\u303F .. etc. \uFFE0-\uFFE6]', 'g')
.
Sources
License
MIT license