
Security News
Next.js Patches Critical Middleware Vulnerability (CVE-2025-29927)
Next.js has patched a critical vulnerability (CVE-2025-29927) that allowed attackers to bypass middleware-based authorization checks in self-hosted apps.
html-encoding-sniffer
Advanced tools
The html-encoding-sniffer npm package is designed to determine the encoding of HTML documents. It does this by examining the byte stream of the document, looking for any encoding declarations in the form of a meta tag or an HTTP header. This is particularly useful for applications that need to correctly interpret or display HTML content from various sources, ensuring that text is properly encoded and displayed.
Sniffing HTML encoding from HTTP headers
This feature allows you to determine the encoding of an HTML document by examining the HTTP headers. The 'transportLayerEncodingLabel' option is used to specify the encoding declared in the HTTP headers.
"use strict";
const htmlEncodingSniffer = require('html-encoding-sniffer');
const encoding = htmlEncodingSniffer(byteStream, { transportLayerEncodingLabel: 'utf-8' });
Sniffing HTML encoding from a meta tag
This feature enables the detection of the document's encoding by looking for a meta tag within the HTML that specifies the encoding. The 'defaultEncoding' option allows you to specify a fallback encoding in case no encoding is declared in the document.
"use strict";
const htmlEncodingSniffer = require('html-encoding-sniffer');
const encoding = htmlEncodingSniffer(byteStream, { defaultEncoding: 'windows-1252' });
iconv-lite is a package that provides encoding and decoding of text in various character sets. Unlike html-encoding-sniffer, which is specifically designed for sniffing HTML document encodings, iconv-lite supports a broader range of encodings and can be used for general text conversion purposes.
jschardet is a character encoding detector, similar to the functionality provided by html-encoding-sniffer. However, jschardet is based on the universalchardet library and can be used to detect the encoding of any text, not just HTML documents. It offers a more general approach to encoding detection.
This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>
-related patterns.
const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");
const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);
The passed bytes are given as a Uint8Array
; the Node.js Buffer
subclass of Uint8Array
will also work, as shown above.
The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:
const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);
You can pass two potential options to htmlEncodingSniffer
:
const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
transportLayerEncodingLabel,
defaultEncoding
});
These represent two possible inputs into the encoding sniffing algorithm:
transportLayerEncodingLabel
is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type
header), which overrides everything but a BOM.defaultEncoding
is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252"
, as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en
locale).This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.
FAQs
Sniff the encoding from a HTML byte stream
The npm package html-encoding-sniffer receives a total of 25,071,041 weekly downloads. As such, html-encoding-sniffer popularity was classified as popular.
We found that html-encoding-sniffer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 6 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Next.js has patched a critical vulnerability (CVE-2025-29927) that allowed attackers to bypass middleware-based authorization checks in self-hosted apps.
Security News
A survey of 500 cybersecurity pros reveals high pay isn't enough—lack of growth and flexibility is driving attrition and risking organizational security.
Product
Socket, the leader in open source security, is now available on Google Cloud Marketplace for simplified procurement and enhanced protection against supply chain attacks.