Security News
Namecheap Takes Down Polyfill.io Service Following Supply Chain Attack
Polyfill.io has been serving malware for months via its CDN, after the project's open source maintainer sold the service to a company based in China.
chardet
Advanced tools
Package description
The chardet npm package is a character encoding detector library, which allows you to determine the encoding of a given piece of text or a file. It is based on the character detection component of the ICU (International Components for Unicode) project and can be useful when dealing with text data that does not have encoding information.
Detecting encoding of a text buffer
This code reads a file and uses chardet to detect the encoding of its content. The 'detect' function takes a buffer and returns the name of the encoding it believes the text is in.
const chardet = require('chardet');
const fs = require('fs');
fs.readFile('/path/to/file', (err, data) => {
if (err) throw err;
const encoding = chardet.detect(data);
console.log(encoding);
});
Detecting encoding with confidence
This code creates a buffer from a string and uses chardet's 'detectAll' function to get an array of possible encodings along with their confidence scores.
const chardet = require('chardet');
const buffer = Buffer.from('Some text with unknown encoding');
const result = chardet.detectAll(buffer);
console.log(result);
Detecting encoding of a file stream
This code creates a read stream from a file and uses chardet's 'detectStream' function to detect the encoding of the streamed content asynchronously.
const chardet = require('chardet');
const fs = require('fs');
const stream = fs.createReadStream('/path/to/file');
chardet.detectStream(stream).then(encoding => {
console.log(encoding);
});
iconv-lite is a character encoding conversion library. Unlike chardet, which detects the encoding, iconv-lite is used to convert from one encoding to another. It supports many encodings and is often used in conjunction with chardet to first detect the encoding and then convert the text.
jschardet is a port of the python library chardet. It serves the same purpose as the chardet npm package, which is to detect the character encoding of text. The main difference may be in the implementation details and the specific encodings supported by each library.
The encoding npm package is another library for encoding and decoding text. It provides a simpler API for converting between encodings but does not have the detection capabilities of chardet. It's often used when the encoding is already known.
Readme
Chardet is a character detection module for NodeJS written in pure Javascript. Module is based on ICU project http://site.icu-project.org/, which uses character occurency analysis to determine the most probable encoding.
npm i chardet
var chardet = require('chardet');
chardet.detect(new Buffer('hello there!'));
// or
chardet.detectFile('/path/to/file', function(err, encoding) {});
// or
chardet.detectFileSync('/path/to/file');
Currently only these encodings are supported, more will be added soon.
FAQs
Character encoding detector
The npm package chardet receives a total of 16,201,880 weekly downloads. As such, chardet popularity was classified as popular.
We found that chardet demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Polyfill.io has been serving malware for months via its CDN, after the project's open source maintainer sold the service to a company based in China.
Security News
OpenSSF is warning open source maintainers to stay vigilant against reputation farming on GitHub, where users artificially inflate their status by manipulating interactions on closed issues and PRs.
Security News
A JavaScript library maintainer is under fire after merging a controversial PR to support legacy versions of Node.js.