Socket
Socket
Sign inDemoInstall

node-icu-charset-detector

Package Overview
Dependencies
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

node-icu-charset-detector

Simple binding for ICU charset detector


Version published
Weekly downloads
366
decreased by-4.44%
Maintainers
1
Weekly downloads
 
Created
Source

ICU Character Set Detection for Node.js

Character set detection is the process of determining the character set, or encoding, of character data in an unknown format.

A simple binding of ICU character set detection (http://userguide.icu-project.org/conversion/detection) for Node.js.

Installation

At first, install libicu into your system. Debian users can install libicu by apt-get easily.

sudo apt-get install libicu-dev

After that, install node-icu-charset-detector from npm.

npm install node-icu-charset-detector

If you prefer to install the package by hand, try following commands.

git clone git://github.com/mooz/node-icu-charset-detector.git
cd node-icu-charset-detector
node-waf configure
node-waf build
node-waf install

Usage

Simple usage

node-icu-charset-detector provides a class CharsetMatch which takes a instance of Buffer for the first argument of the constructor. A instance of CharsetMatch has three methods below.

  • CharsetMatch.prototype.getName()
    • returns the name of detected character set.
  • CharsetMatch.prototype.getLanguage()
    • returns the language for detected character set.
  • CharsetMatch.prototype.getConfidence()
    • returns the confidence of detection.

Here is a simple usage of node-icu-charset-detector.

var charsetDetector = require("node-icu-charset-detector");
var CharsetMatch = charsetDetector.CharsetMatch;

var byteArray = fs.readFileSync(path);
var charsetMatch = new CharsetMatch(byteArray);

var detectedCharsetName = charsetMatch.getName();
var detectedLanguage = charsetMatch.getLanguage();
var detectionConfidence = charsetMatch.getConfidence();

Leveraging node-iconv

Since ICU itself does not have a feature to convert character sets, you may need to use node-iconv (https://github.com/bnoordhuis/node-iconv) which has a powerful character sets converting feature.

Here is a simple example to leverage node-iconv to convert character sets which is not supported by native Node.js.

var Iconv = require("iconv").Iconv;

function bufferToString(buffer, charset) {
  try {
    return buffer.toString(charset);
  } catch (x) {
    var charsetConverter = new Iconv(charset, "utf8");
    return charsetConverter.convert(buffer).toString();
  }
}

var charsetMatch = new CharsetMatch(byteArray);
var bufferString = bufferToString(byteArray, charsetMatch.getName());

Keywords

FAQs

Package last updated on 17 May 2012

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc