
Company News
Meet the Socket Team at RSAC and BSidesSF 2026
Join Socket for live demos, rooftop happy hours, and one-on-one meetings during BSidesSF and RSA 2026 in San Francisco.
html5parser
Advanced tools
html5parser is a super fast and tiny HTML5 parser.
5kb.Package manager
npm i -S html5parser
# or var yarn
yarn add html5parser
CDN
<script src="https://unpkg.com/html5parser@latest/dist/html5parser.umd.js"></script>
import { parse, walk, SyntaxKind } from 'html5parser';
const ast = parse('<!DOCTYPE html><head><title>Hello html5parser!</title></head></html>');
walk(ast, {
enter: (node) => {
if (node.type === SyntaxKind.Tag && node.name === 'title' && Array.isArray(node.body)) {
const text = node.body[0];
if (text.type !== SyntaxKind.Text) {
return;
}
const div = document.createElement('div');
div.innerHTML = `The title of the input is <strong>${text.value}</strong>`;
document.body.appendChild(div);
}
},
});
Low level API to parse string to tokens:
function tokenize(input: string): IToken[];
IToken
interface IToken {
start: number;
end: number;
value: string;
type: TokenKind;
}
TokenKind
const enum TokenKind {
Literal,
OpenTag, // trim leading '<'
OpenTagEnd, // trim tailing '>', only could be '/' or ''
CloseTag, // trim leading '</' and tailing '>'
Whitespace, // the whitespace between attributes
AttrValueEq,
AttrValueNq,
AttrValueSq,
AttrValueDq,
}
Core API to parse string to AST:
function parse(input: string, options?: ParseOptions): INode[];
ParseOptions
interface ParseOptions {
// create tag's attributes map
// if true, will set ITag.attributeMap property
// as a `Record<string, IAttribute>`
setAttributeMap: boolean;
}
INode
export type INode = IText | ITag;
ITag
export interface ITag extends IBaseNode {
type: SyntaxKind.Tag;
// original open tag, <Div id="id">
open: IText;
// lower case tag name, div
name: string;
// original case tag name, Div
rawName: string;
attributes: IAttribute[];
// the attribute map, if `options.setAttributeMap` is `true`
// this will be a Record, key is the attribute name literal,
// value is the attribute self.
attributeMap: Record<string, IAttribute> | undefined;
body:
| Array<ITag | IText> // with close tag
| undefined // self closed
| null; // EOF before open tag end
// original close tag, </DIV >
close:
| IText // with close tag
| undefined // self closed
| null; // EOF before end or without close tag
}
IAttribute
export interface IAttribute extends IBaseNode {
name: IText;
value: IAttributeValue | undefined;
}
IAttributeValue
export interface IAttributeValue extends IBaseNode {
value: string;
quote: "'" | '"' | undefined;
}
IText
export interface IText extends IBaseNode {
type: SyntaxKind.Text;
value: string;
}
IBaseNode
export interface IBaseNode {
start: number;
end: number;
}
SyntaxKind
export enum SyntaxKind {
Text = 'Text',
Tag = 'Tag',
}
Visit all the nodes of the AST with specified callbacks:
function walk(ast: INode[], options: WalkOptions): void;
IWalkOptions
export interface IWalkOptions {
enter?(node: INode, parent: INode | void, index: number): void;
leave?(node: INode, parent: INode | void, index: number): void;
}
Parse input to AST and keep the tags and attributes by whitelists, and then print it to a string.
function safeHtml(input: string, options?: Partial<SafeHtmlOptions>): string;
SafeHtmlOptions
export interface SafeHtmlOptions {
allowedTags: string[];
allowedAttrs: string[];
tagAllowedAttrs: Record<string, string[]>;
allowedUrl: RegExp;
}
The default options of safeHtml, you can modify it, its
effect is global.
const safeHtmlDefaultOptions: SafeHtmlOptions;
This is use for HTML5, that means:
<? ... ?>, <! ... > (except for <!doctype ...>, case insensitive)
is treated as Comment, that means CDATASection is treated as comment."!doctype" (case insensitive), the doctype declaration"!": short comment"!--": normal comment""(empty string): short comment, for <? ... >, the leading ? is treated as comment contentThanks for htmlparser-benchmark, I created a pull request at pulls/7, and its result on my MacBook Pro is:
$ npm test
> htmlparser-benchmark@1.1.3 test ~/htmlparser-benchmark
> node execute.js
gumbo-parser failed (exit code 1)
high5 failed (exit code 1)
html-parser : 28.6524 ms/file ± 21.4282
html5 : 130.423 ms/file ± 161.478
html5parser : 2.37975 ms/file ± 3.30717
htmlparser : 16.6576 ms/file ± 109.840
htmlparser2-dom : 3.45602 ms/file ± 5.05830
htmlparser2 : 2.61135 ms/file ± 4.33535
hubbub failed (exit code 1)
libxmljs failed (exit code 1)
neutron-html5parser: 2.89331 ms/file ± 2.94316
parse5 failed (exit code 1)
sax : 10.2110 ms/file ± 13.5204
The MIT License (MIT)
Copyright (c) 2020 acrazing
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
FAQs
A super fast & tiny HTML5 parser
The npm package html5parser receives a total of 45,578 weekly downloads. As such, html5parser popularity was classified as popular.
We found that html5parser demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Join Socket for live demos, rooftop happy hours, and one-on-one meetings during BSidesSF and RSA 2026 in San Francisco.

Research
/Security News
Malicious Packagist packages disguised as Laravel utilities install an encrypted PHP RAT via Composer dependencies, enabling remote access and C2 callbacks.

Research
/Security News
OpenVSX releases of Aqua Trivy 1.8.12 and 1.8.13 contained injected natural-language prompts that abuse local AI coding agents for system inspection and potential data exfiltration.