
Company News
Meet the Socket Team at RSAC and BSidesSF 2026
Join Socket for live demos, rooftop happy hours, and one-on-one meetings during BSidesSF and RSA 2026 in San Francisco.
html5parser
Advanced tools
A simple and fast html5 parser, the result could be manipulated like ECMAScript ESTree, especially about the attributes.
Currently, all the public parsers, like htmlparser2, parser5, etc,
could not be used for manipulate attributes. For example: the htmlparser2
has startIndex and endIndex for tags and texts, but no range information
about attribute name and values. This project is used for resolve this problem.
Just added ranges for tags, texts, and attribute name and values, and else,
with the information of attribute quote type, (without or with '/").
# var npm
npm install html5parser -S
# var yarn
yarn add html5parser
import * as html from 'html5parser'
const input = `
<!DOCTYPE html>
<html>
<body>
<h1 id="hello">Hello world</h1>
</body>
</html>
`
const ast = html.parse(input)
html.walk(ast, {
enter: (node) => {
if (node.type === html.SyntaxKind.Tag) {
for (const attr of node.attributes) {
if (attr.value !== void 0) {
// This is used for present the ranges of attributes.
console.log(input.substring(attr.value.start, attr.value.end))
// you can get the value directly:
console.log(attr.value.value)
}
}
}
}
})
// Should output:
// hello
// Top level API, parse html to ast tree
export function parse(input: string): INode[];
// Low level API, get tokens
export function tokenize(input: string): IToken[];
// Utils API, walk the ast tree
export function walk(ast: INode[], options: IWalkOptions): void;
IBaseNode: the base struct for all the nodes:
export interface IBaseNode {
start: number; // the start position of the node (include)
end: number; // the end position of the node (exclude)
}
IText: The text node struct:
export interface IText extends IBaseNode {
type: SyntaxKind.Text;
value: string; // text value
}
ITag: The tag node struct
export interface ITag extends IBaseNode {
type: SyntaxKind.Tag;
open: IText; // the open tag, just like <div>, <img/>, etc.
name: string; // the tag name, just like div, img, etc.
attributes: IAttribute[]; // the attributes
body: Array<ITag | IText> // with close tag, if body is empty, it is empty array, just like <div></div>
| void // self closed, just like <div/>, <img>
| null; // eof before open tag end just liek <div
close: IText // with close tag, just like </div>, etc.
| void // self closed, just like open with <div/> <img>
| null; // eof before open tag end or without close tag for not self closed tag
}
IAttribute: the attribute struct:
export interface IAttribute extends IBaseNode {
name: IText; // the name of the attribute
value: IAttributeValue | void; // the value of the attribute
}
IAttributeValue: the attribute value struct:
// NOTE: the range start and end contains quotes.
export interface IAttributeValue extends IBaseNode {
value: string; // the value text, exclude leading and tailing `'` or `"`
quote: '\'' | '"' | void; // the quote type
}
INode: the exposed nodes:
export type INode = ITag | IText
This is use for HTML5, that means:
<? ... ?>, <! ... > (except for <!doctype ...>, case insensitive)
is treated as Comment, that means CDATASection is treated as comment."!doctype" (case insensitive), the doctype declaration"!": short comment"!--": normal comment""(empty string): short comment, for <? ... >, the leading ? is treated as comment contentFAQs
A super fast & tiny HTML5 parser
The npm package html5parser receives a total of 43,882 weekly downloads. As such, html5parser popularity was classified as popular.
We found that html5parser demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Join Socket for live demos, rooftop happy hours, and one-on-one meetings during BSidesSF and RSA 2026 in San Francisco.

Research
/Security News
Malicious Packagist packages disguised as Laravel utilities install an encrypted PHP RAT via Composer dependencies, enabling remote access and C2 callbacks.

Research
/Security News
OpenVSX releases of Aqua Trivy 1.8.12 and 1.8.13 contained injected natural-language prompts that abuse local AI coding agents for system inspection and potential data exfiltration.