New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

unicode-dragon

Package Overview
Dependencies
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

unicode-dragon

Eats invalid unicode for breakfast. Replaces unmatched surrogate pairs by U+FFFD

latest
Source
npmnpm
Version
0.1.3
Version published
Maintainers
1
Created
Source

Unicode dragon 🐉

There's a bug in V8 which means that in rare circumstances Node strings can contain invalid Unicode.

As this is an abhorrent situation, you should use unicode-dragon to ensure that any untrusted user input is actually valid unicode before saving it to your database.

Installation

You can install unicode-dragon with npm. If you're really desperate, you can also copy-pasted the function out of index.js into whereever you need.

npm install unicode-dragon

Usage

unicodeDragon = require('unicode-dragon');
var validString = unicodeDragon(buffer.toString('utf-8'));

Details

V8 strings are internally UCS-2, which is a legacy Unicode encoding that only works with Unicode 1.0. In Unicode 2.0, 2048 code-points (U+D800 - U+DFFF) were removed from Unicode and are used by UTF-16 in surrogate pairs to represent code-points higher than U+FFFF (which is the highest UCS-2 can support).

That said, the UTF-8 codecs in V8 can support characters greater than U+FFFF as though the internal strings were UTF-16. This is awesome, but there's a small mistake, and V8 will accidentally parse invalid UTF-8 that happens to include one of the removed surrogate code-points.

In the case that there is a valid surrogate pair, this is fine, as internally the UCS-2 buffer is now valid UTF-16, and so the encoder will re-encode it to valid UTF-8. (UTF-8 with surrogate pairs is more properly known as CESU-8, and is a frowned upon hack). This is definitely not fine however in the case were the surrogates appear individually, or incorrectly paired. As the UTF-8 encoder will output another invalid surrogate.

And if that makes your brain hurt, stay safe, and use unicodeDragon to guard yourself from invalid user input.

Meta-fu

Unicode-dragon is licensed under the MIT licence. Bug-reports and contributions are welcome.

FAQs

Package last updated on 09 Sep 2013

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts