node-icu-charset-detector
Advanced tools
Comparing version 0.0.5 to 0.0.6
{ | ||
"name" : "node-icu-charset-detector", | ||
"version" : "0.0.5", | ||
"main" : "./build/Release/node-icu-charset-detector", | ||
"version" : "0.0.6", | ||
"main" : "./node-icu-charset-detector.js", | ||
"description" : "Simple binding for ICU charset detector", | ||
@@ -6,0 +6,0 @@ "keywords" : ["charset-detection", "icu"], |
@@ -17,10 +17,2 @@ # ICU Character Set Detection for Node.js | ||
If you prefer to install the package by hand, try following commands. | ||
git clone git://github.com/mooz/node-icu-charset-detector.git | ||
cd node-icu-charset-detector | ||
node-waf configure | ||
node-waf build | ||
node-waf install | ||
## Usage | ||
@@ -30,35 +22,34 @@ | ||
`node-icu-charset-detector` provides a class `CharsetMatch` which takes a instance of `Buffer` for the first argument of the constructor. A instance of `CharsetMatch` has three methods below. | ||
`node-icu-charset-detector` provides a function `detectCharset(buffer)`, where `buffer` is an instance of `Buffer` whose charset should be detected. | ||
- `CharsetMatch.prototype.getName()` | ||
- returns the name of detected character set. | ||
- `CharsetMatch.prototype.getLanguage()` | ||
- returns the language for detected character set. | ||
- `CharsetMatch.prototype.getConfidence()` | ||
- returns the confidence of detection. | ||
var charsetDetector = require("node-icu-charset-detector"); | ||
Here is a simple usage of `node-icu-charset-detector`. | ||
var charsetDetector = require("node-icu-charset-detector"); | ||
var CharsetMatch = charsetDetector.CharsetMatch; | ||
var buffer = fs.readFileSync("/path/to/the/file"); | ||
var charset = charsetDetector.detectCharset(buffer); | ||
var byteArray = fs.readFileSync(path); | ||
var charsetMatch = new CharsetMatch(byteArray); | ||
var detectedCharsetName = charsetMatch.getName(); | ||
var detectedLanguage = charsetMatch.getLanguage(); | ||
var detectionConfidence = charsetMatch.getConfidence(); | ||
console.log("charset name: " + charset.toString()); | ||
console.log("language: " + charset.language); | ||
console.log("detection confidence: " + charset.confidence); | ||
`detectCharset(buffer)` returns the detected charset name for `buffer`, and the returned charset name has two extra properties `language` and `confidence`: | ||
- `charset.language` | ||
- language name for the detected character set. | ||
- `charset.confidence` | ||
- confidence of the charset detection for `charset`. | ||
### Leveraging node-iconv | ||
Since ICU itself does not have a feature to convert character sets, you may need to use `node-iconv` (https://github.com/bnoordhuis/node-iconv) which has a powerful character sets converting feature. | ||
Since ICU itself does not have a feature to convert character sets, you may need to use `node-iconv` (https://github.com/bnoordhuis/node-iconv), which has a powerful character sets converting feature. | ||
Here is a simple example to leverage `node-iconv` to convert character sets which is not supported by native Node.js. | ||
Here is a simple example to leverage `node-iconv` to convert character sets not supported by Node itself. | ||
var Iconv = require("iconv").Iconv; | ||
function bufferToString(buffer, charset) { | ||
function bufferToString(buffer) { | ||
var charsetDetector = require("node-icu-charset-detector"); | ||
var charset = charsetDetector.detectCharset(buffer).toString(); | ||
try { | ||
return buffer.toString(charset); | ||
} catch (x) { | ||
var Iconv = require("iconv").Iconv; | ||
var charsetConverter = new Iconv(charset, "utf8"); | ||
@@ -68,4 +59,4 @@ return charsetConverter.convert(buffer).toString(); | ||
} | ||
var charsetMatch = new CharsetMatch(byteArray); | ||
var bufferString = bufferToString(byteArray, charsetMatch.getName()); | ||
var buffer = fs.readFileSync("/path/to/the/file"); | ||
var bufferString = bufferToString(buffer); |
Sorry, the diff of this file is not supported yet
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
Trivial Package
Supply chain riskPackages less than 10 lines of code are easily copied into your own project and may not warrant the additional supply chain risk of an external dependency.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
Empty package
Supply chain riskPackage does not contain any code. It may be removed, is name squatting, or the result of a faulty package publish.
Found 1 instance in 1 package
9185
9
10
60