The chardet npm package is a character encoding detector library, which allows you to determine the encoding of a given piece of text or a file. It is based on the character detection component of the ICU (International Components for Unicode) project and can be useful when dealing with text data that does not have encoding information.

What are chardet's main functionalities?

Detecting encoding of a text buffer

This code reads a file and uses chardet to detect the encoding of its content. The 'detect' function takes a buffer and returns the name of the encoding it believes the text is in.

const chardet = require('chardet');
const fs = require('fs');

fs.readFile('/path/to/file', (err, data) => {
  if (err) throw err;
  const encoding = chardet.detect(data);
  console.log(encoding);
});

Detecting encoding with confidence

This code creates a buffer from a string and uses chardet's 'detectAll' function to get an array of possible encodings along with their confidence scores.

const chardet = require('chardet');

const buffer = Buffer.from('Some text with unknown encoding');
const result = chardet.detectAll(buffer);
console.log(result);

Detecting encoding of a file stream

This code creates a read stream from a file and uses chardet's 'detectStream' function to detect the encoding of the streamed content asynchronously.

const chardet = require('chardet');
const fs = require('fs');

const stream = fs.createReadStream('/path/to/file');
chardet.detectStream(stream).then(encoding => {
  console.log(encoding);
});

Other packages similar to chardet

chardet

Chardet is a character detection module written in pure JavaScript (TypeScript). Module uses occurrence analysis to determine the most probable encoding.

Packed size is only 22 KB
Works in all environments: Node / Browser / Native
Works on all platforms: Linux / Mac / Windows
No dependencies
No native code / bindings
100% written in TypeScript
Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

import chardet from 'chardet';

const encoding = chardet.detect(Buffer.from('hello there!'));
// or
const encoding = await chardet.detectFile('/path/to/file');
// or
const encoding = chardet.detectFileSync('/path/to/file');

To return the full list of possible encodings use analyse method.

import chardet from 'chardet';
chardet.analyse(Buffer.from('hello there!'));

Returned value is an array of objects sorted by confidence value in descending order

[
  { confidence: 90, name: 'UTF-8' },
  { confidence: 20, name: 'windows-1252', lang: 'fr' },
];

In browser, you can use Uint8Array instead of the Buffer:

import chardet from 'chardet';
chardet.analyse(new Uint8Array([0x68, 0x65, 0x6c, 0x6c, 0x6f]));

Working with large data sets

Sometimes, when data set is huge and you want to optimize performance (with a trade off of less accuracy), you can sample only the first N bytes of the buffer:

const encoding = await chardet.detectFile('/path/to/file', { sampleSize: 32 });

You can also specify where to begin reading from in the buffer:

const encoding = await chardet.detectFile('/path/to/file', {
  sampleSize: 32,
  offset: 128,
});

Working with strings

In both Node.js and browsers, all strings in memory are represented in UTF-16 encoding. This is a fundamental aspect of the JavaScript language specification. Therefore, you cannot use plain strings directly as input for chardet.analyse() or chardet.detect(). Instead, you need the original string data in the form of a Buffer or Uint8Array.

In other words, if you receive a piece of data over the network and want to detect its encoding, use the original data payload, not its string representation. By the time you convert data to a string, it will be in UTF-16 encoding.

Note on TextEncoder: By default, it returns a UTF-8 encoded buffer, which means the buffer will not be in the original encoding of the string.

Supported Encodings:

UTF-8
UTF-16 LE
UTF-16 BE
UTF-32 LE
UTF-32 BE
ISO-2022-JP
ISO-2022-KR
ISO-2022-CN
Shift_JIS
Big5
EUC-JP
EUC-KR
GB18030
ISO-8859-1
ISO-8859-2
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
KOI8-R

Currently only these encodings are supported.

TypeScript?

Yes. Type definitions are included.

References

ICU project http://site.icu-project.org/

Keywords

FAQs

What is chardet?

Is chardet popular?

Is chardet well maintained?

Package last updated on 24 Feb 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

chardet

What is chardet?

What are chardet's main functionalities?

Other packages similar to chardet

iconv-lite

jschardet

encoding

chardet

Installation

Usage

Working with large data sets

Working with strings

Supported Encodings:

TypeScript?

References

Keywords

Related posts

AGENTS.md Gains Traction as an Open Format for AI Coding Agents

Wallet-Draining npm Package Impersonates Nodemailer to Hijack Crypto Transactions

Risky Biz Podcast: Making Reachability Analysis Work in Real-World Codebases