Big update!Announcing Socket for GitHub 1.0. Learn more
Socket
BlogLoveFAQ
Install
Log in

html-encoding-sniffer

Package Overview
Dependencies
1
Maintainers
6
Versions
6
Issues
File Explorer

Advanced tools

html-encoding-sniffer

Sniff the encoding from a HTML byte stream

    3.0.0latest

Version published
Maintainers
6
Weekly downloads
18,488,432
decreased by-2.62%

Weekly downloads

Changelog

Source

3.0.0

Raised the minimum required Node.js version to v12.

Although it worked before by accident, as of this version any Uint8Array input is officially supported, not just Node.js Buffer objects.

Readme

Source

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer"); const fs = require("fs"); const htmlBytes = fs.readFileSync("./html-page.html"); const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding"); const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, { transportLayerEncodingLabel, defaultEncoding });

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

Keywords

FAQs

What is html-encoding-sniffer?

Sniff the encoding from a HTML byte stream

Is html-encoding-sniffer popular?

The npm package html-encoding-sniffer receives a total of 15,541,500 weekly downloads. As such, html-encoding-sniffer popularity was classified as popular.

Is html-encoding-sniffer well maintained?

We found that html-encoding-sniffer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 6 open source maintainers collaborating on the project.

Last updated on 18 Sep 2021
Socket

Product

Subscribe to our newsletter

Get open source security insights delivered straight into your inbox. Be the first to learn about new features and product updates.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc