Research
Security News
Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.
UTF-safe string operations for Javascript.
Javascript strings work great for holding text in English and other Latin-based languages, but they fall short when it comes to languages in Unicode's astral plane.
Consider this Javascript code. What number does len
contain?
var str = '𤔣';
var len = str.length;
If you said 1
, you're clearly a hopeful idealist. In fact, len
contains 2
. To explain why this is, we need to understand a few things about the Unicode standard.
Unicode isn't all that complicated. It's just a huge series of numbers, called "codepoints," one for each logical character. Unicode includes character sets for ideographic languages like Chinese, nearly 1,800 emojis, and characters for scripts like Cherokee, Amharic, Greek, and Georgian (just to name a few). There are literally hundreds of thousands of characters specified in the Unicode standard.
Encoding is the process of converting Unicode codepoints into binary data that can be written or transmitted by a computer system. Javascript strings are encoded in UTF-16, meaning every character takes up 16 bits, or 2 bytes (there are 8 bits per byte). The problem is that not every Unicode character can be encoded in 2 bytes, since 216 is only 65536 - not nearly enough space to represent each of the hundreds of thousands of Unicode characters.
To mitigate this problem, Javascript (as well as other languages and platforms that use UTF-16 encoding) makes use of what are called "surrogate pairs." Surrogate pairs are two encoded characters that represent a single logical character. Together they are 4 bytes wide and can represent every Unicode character (232 = 4,294,967,296).
Unfortunately, that's where the good news ends. Javascript still counts each group of two bytes as a character, meaning any character made up of a surrogate pair looks like two logical characters to Javascript instead of just one. That's why len
contains 2
in the example above.
Javascript's inability to correctly count surrogate pairs means a bunch of its string operations aren't safe to perform on foreign characters. This includes such favorites as indexOf
, slice
, and substr
.
This library contains a number of UTF-safe string operations, including the ones I just mentioned. These operations respect surrogate pairs to ensure you're not caught off guard.
UtfString is designed to be used in node.js or in the browser.
In node:
var UtfString = require('utfstring');
In the browser, UtfString
will be available on window
.
UtfString currently supports the following string operations:
charAt(String str, Integer index)
- Returns the character at the given index.
charCodeAt(String str, Integer index)
- Returns the Unicode codepoint at the given index.
fromCharCode(Integer codepoint)
- Returns the string for the given Unicode codepoint.
indexOf(String str, String searchValue, [Integer start])
- Finds the first instance of the search value within the string. Starts at an optional offset.
lastIndexOf(Str string, string searchValue, [Integer start])
- Finds the last instance of the search value within the string. Starts searching backwards at an optional offset, which can be negative.
slice(String str, Integer start, Integer finish)
- Returns the characters between the two given indices.
substr(String str, Integer start, Integer length)
- Returns the characters starting at the given start index up to the start index plus the given length. Also aliased as substring
.
length(String str)
- Returns the number of logical characters in the given string.
stringToCodePoints(String str)
- Converts a string into an array of codepoints.
codePointsToString(Array arr)
- Converts an array of codepoints into a string.
stringToBytes(String str)
- Converts a string into an array of UTF-16 bytes.
bytesToString(Array arr)
- Converts an array of UTF-16 bytes into a string.
stringToCharArray(String str)
- Converts the given string into an array of invidivual logical characters. Note that each entry in the returned array may be more than one UTF-16 character.
Tests are written in Jasmine and can be executed via jasmine-node:
npm install -g jasmine-node
jasmine-node spec
Written and maintained by Cameron C. Dutro (@camertron).
Copyright 2016 Cameron Dutro, licensed under the MIT license.
1.2.0
var UtfString = require('utfstring')
works instead of having to do var UtfString = require('utfstring/utfstring.js').UtfString
.FAQs
UTF-safe string operations
The npm package utfstring receives a total of 13,272 weekly downloads. As such, utfstring popularity was classified as popular.
We found that utfstring demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.
Security News
NVD’s backlog surpasses 20,000 CVEs as analysis slows and NIST announces new system updates to address ongoing delays.
Security News
Research
A malicious npm package disguised as a WhatsApp client is exploiting authentication flows with a remote kill switch to exfiltrate data and destroy files.