Socket
Socket
Sign inDemoInstall

chardet

Package Overview
Dependencies
0
Maintainers
1
Versions
27
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

chardet

Character encoding detector


Version published
Maintainers
1
Weekly downloads
18,779,197
decreased by-9.08%

Weekly downloads

Package description

What is chardet?

The chardet npm package is a character encoding detector library, which allows you to determine the encoding of a given piece of text or a file. It is based on the character detection component of the ICU (International Components for Unicode) project and can be useful when dealing with text data that does not have encoding information.

What are chardet's main functionalities?

Detecting encoding of a text buffer

This code reads a file and uses chardet to detect the encoding of its content. The 'detect' function takes a buffer and returns the name of the encoding it believes the text is in.

const chardet = require('chardet');
const fs = require('fs');

fs.readFile('/path/to/file', (err, data) => {
  if (err) throw err;
  const encoding = chardet.detect(data);
  console.log(encoding);
});

Detecting encoding with confidence

This code creates a buffer from a string and uses chardet's 'detectAll' function to get an array of possible encodings along with their confidence scores.

const chardet = require('chardet');

const buffer = Buffer.from('Some text with unknown encoding');
const result = chardet.detectAll(buffer);
console.log(result);

Detecting encoding of a file stream

This code creates a read stream from a file and uses chardet's 'detectStream' function to detect the encoding of the streamed content asynchronously.

const chardet = require('chardet');
const fs = require('fs');

const stream = fs.createReadStream('/path/to/file');
chardet.detectStream(stream).then(encoding => {
  console.log(encoding);
});

Other packages similar to chardet

Readme

Source

chardet

Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

  • Packed size is only 22 KB
  • Works in all environments: Node / Browser / Native
  • Works on all platforms: Linux / Mac / Windows
  • No dependencies
  • No native code / bindings
  • 100% written in Typescript
  • Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

const chardet = require('chardet');

chardet.detect(Buffer.from('hello there!'));
// or
chardet.detectFile('/path/to/file').then(encoding => console.log(encoding));
// or
chardet.detectFileSync('/path/to/file');

To return the full list of possible encodings use analyse method.

const chardet = require('chardet');
chardet.analyse(Buffer.from('hello there!'));

Returned value is an array of objects sorted by confidence value in decending order

[
  { confidence: 90, name: 'UTF-8' },
  { confidence: 20, name: 'windows-1252', lang: 'fr' }
];

Working with large data sets

Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:

chardet
  .detectFile('/path/to/file', { sampleSize: 32 })
  .then(encoding => console.log(encoding));

Supported Encodings:

  • UTF-8
  • UTF-16 LE
  • UTF-16 BE
  • UTF-32 LE
  • UTF-32 BE
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • Shift_JIS
  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • windows-1250
  • windows-1251
  • windows-1252
  • windows-1253
  • windows-1254
  • windows-1255
  • windows-1256
  • KOI8-R

Currently only these encodings are supported.

Typescript?

Yes. Type definitions are included.

References

  • ICU project http://site.icu-project.org/

Keywords

FAQs

Last updated on 19 Oct 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc