
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
node-icu-charset-detector
Advanced tools
Character set detection is the process of determining the character set, or encoding, of character data in an unknown format.
A simple binding of ICU character set detection (http://userguide.icu-project.org/conversion/detection) for Node.js.
At first, install libicu
into your system (See this instruction for details).
After that, install node-icu-charset-detector
from npm.
npm install node-icu-charset-detector
Debian (Ubuntu)
apt-get install libicu-dev
Gentoo
emerge icu
Fedora/CentOS
yum install libicu-devel
MacPorts
port install icu +devel
Homebrew
brew install icu4c
brew link icu4c --force
If experiencing issues with 'homebrew' installing version 50.1 of icu4c, try the following:
brew search icu4c
brew tap homebrew/versions
brew versions icu4c
cd $(brew --prefix) && git pull --rebase
git checkout c25fd2f $(brew --prefix)/Library/Formula/icu4c.rb
brew install icu4c
curl -O http://download.icu-project.org/files/icu4c/52.1/icu4c-52_1-src.tgz
tar xzvf icu4c-4_4_2-src.tgz
cd icu/source
chmod +x runConfigureICU configure install-sh
./runConfigureICU MacOSX
make
sudo make install
xcode-select --install
node-icu-charset-detector
provides a function detectCharset(buffer)
, where buffer
is an instance of Buffer
whose charset should be detected.
var charsetDetector = require("node-icu-charset-detector");
var buffer = fs.readFileSync("/path/to/the/file");
var charset = charsetDetector.detectCharset(buffer);
console.log("charset name: " + charset.toString());
console.log("language: " + charset.language);
console.log("detection confidence: " + charset.confidence);
detectCharset(buffer)
returns the detected charset name for buffer
, and the returned charset name has two extra properties language
and confidence
:
charset.language
charset.confidence
charset
.Since ICU itself does not have a feature to convert character sets, you may need to use node-iconv
(https://github.com/bnoordhuis/node-iconv), which has a powerful character sets converting feature.
Here is a simple example to leverage node-iconv
to convert character sets not supported by Node itself.
function bufferToString(buffer) {
var charsetDetector = require("node-icu-charset-detector");
var charset = charsetDetector.detectCharset(buffer).toString();
try {
return buffer.toString(charset);
} catch (x) {
var Iconv = require("iconv").Iconv;
var charsetConverter = new Iconv(charset, "utf8");
return charsetConverter.convert(buffer).toString();
}
}
var buffer = fs.readFileSync("/path/to/the/file");
var bufferString = bufferToString(buffer);
FAQs
Simple binding for ICU charset detector
The npm package node-icu-charset-detector receives a total of 305 weekly downloads. As such, node-icu-charset-detector popularity was classified as not popular.
We found that node-icu-charset-detector demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.