Big update!Introducing GitHub Bot Commands. Learn more
Socket
Log inBook a demo

chardet

Package Overview
Dependencies
0
Maintainers
1
Versions
23
Issues
File Explorer

Advanced tools

chardet

Character encoding detector

    1.5.0latest

Version published
Maintainers
1
Weekly downloads
15,633,272
decreased by-11.54%

Weekly downloads

Changelog

Source

v1.5.0

1.5.0 (2022-10-09)

Features

  • allow position offset as option (a169cc5)

Readme

Source

chardet

Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

  • Packed size is only 22 KB
  • Works in all environments: Node / Browser / Native
  • Works on all platforms: Linux / Mac / Windows
  • No dependencies
  • No native code / bindings
  • 100% written in Typescript
  • Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

import chardet from 'chardet'; const encoding = chardet.detect(Buffer.from('hello there!')); // or const encoding = await chardet.detectFile('/path/to/file'); // or const encoding = chardet.detectFileSync('/path/to/file');

To return the full list of possible encodings use analyse method.

import chardet from 'chardet'; chardet.analyse(Buffer.from('hello there!'));

Returned value is an array of objects sorted by confidence value in descending order

[ { confidence: 90, name: 'UTF-8' }, { confidence: 20, name: 'windows-1252', lang: 'fr' } ];

Working with large data sets

Sometimes, when data set is huge and you want to optimize performance (with a tradeoff of less accuracy), you can sample only the first N bytes of the buffer:

chardet .detectFile('/path/to/file', { sampleSize: 32 }) .then(encoding => console.log(encoding));

You can also specify where to begin reading from in the buffer:

chardet .detectFile('/path/to/file', { sampleSize: 32, offset: 128 }) .then(encoding => console.log(encoding));

Supported Encodings:

  • UTF-8
  • UTF-16 LE
  • UTF-16 BE
  • UTF-32 LE
  • UTF-32 BE
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • Shift_JIS
  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • windows-1250
  • windows-1251
  • windows-1252
  • windows-1253
  • windows-1254
  • windows-1255
  • windows-1256
  • KOI8-R

Currently only these encodings are supported.

Typescript?

Yes. Type definitions are included.

References

Keywords

FAQs

What is chardet?

Character encoding detector

Is chardet popular?

The npm package chardet receives a total of 12,514,553 weekly downloads. As such, chardet popularity was classified as popular.

Is chardet well maintained?

We found that chardet demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 1 open source maintainer collaborating on the project.

Last updated on 09 Oct 2022

Did you know?

Socket installs a Github app to automatically flag issues on every pull request and report the health of your dependencies. Find out what is inside your node modules and prevent malicious activity before you update the dependencies.

Install Socket
Socket

Product

Subscribe to our newsletter

Get open source security insights delivered straight into your inbox. Be the first to learn about new features and product updates.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc