Big update!Announcing Socket for GitHub 1.0. Learn more
Socket
BlogLoveFAQ
Install
Log in

chardet

Package Overview
Dependencies
0
Maintainers
1
Versions
22
Issues
File Explorer

Advanced tools

chardet

Character encoding detector

    1.4.0latest

Version published
Maintainers
1
Weekly downloads
16,539,517
decreased by-2.91%

Weekly downloads

Changelog

Source

v1.4.0

1.4.0 (2021-10-19)

Features

  • Language detection improvements (d0e93bb)

Readme

Source

chardet

Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

  • Packed size is only 22 KB
  • Works in all environments: Node / Browser / Native
  • Works on all platforms: Linux / Mac / Windows
  • No dependencies
  • No native code / bindings
  • 100% written in Typescript
  • Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

const chardet = require('chardet'); chardet.detect(Buffer.from('hello there!')); // or chardet.detectFile('/path/to/file').then(encoding => console.log(encoding)); // or chardet.detectFileSync('/path/to/file');

To return the full list of possible encodings use analyse method.

const chardet = require('chardet'); chardet.analyse(Buffer.from('hello there!'));

Returned value is an array of objects sorted by confidence value in decending order

[ { confidence: 90, name: 'UTF-8' }, { confidence: 20, name: 'windows-1252', lang: 'fr' } ];

Working with large data sets

Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:

chardet .detectFile('/path/to/file', { sampleSize: 32 }) .then(encoding => console.log(encoding));

Supported Encodings:

  • UTF-8
  • UTF-16 LE
  • UTF-16 BE
  • UTF-32 LE
  • UTF-32 BE
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • Shift_JIS
  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • windows-1250
  • windows-1251
  • windows-1252
  • windows-1253
  • windows-1254
  • windows-1255
  • windows-1256
  • KOI8-R

Currently only these encodings are supported.

Typescript?

Yes. Type definitions are included.

References

Keywords

FAQs

What is chardet?

Character encoding detector

Is chardet popular?

The npm package chardet receives a total of 15,741,968 weekly downloads. As such, chardet popularity was classified as popular.

Is chardet well maintained?

We found that chardet demonstrated a healthy version release cadence and project activity. It has 1 open source maintainer collaborating on the project.

Last updated on 19 Oct 2021
Socket

Product

Subscribe to our newsletter

Get open source security insights delivered straight into your inbox. Be the first to learn about new features and product updates.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc