What is html-dom-parser?
The html-dom-parser npm package is designed to parse HTML strings into DOM nodes and vice versa, making it easier to manipulate, traverse, and work with HTML content programmatically in JavaScript environments. It is particularly useful for server-side rendering, web scraping, and building web crawlers or SEO tools.
What are html-dom-parser's main functionalities?
Parsing HTML string to DOM nodes
This feature allows you to convert an HTML string into DOM nodes, enabling programmatic manipulation of the resulting structure. It's useful for extracting information from HTML content or preparing it for further processing.
const parse = require('html-dom-parser');
const domNodes = parse('<div><p>Hello World</p></div>');
Converting DOM nodes back to HTML string
This functionality allows you to take DOM nodes (possibly after manipulation) and convert them back into an HTML string. This is particularly useful for generating HTML content dynamically or modifying existing HTML content programmatically.
const domToHtml = require('html-dom-parser').domToHtml;
const htmlString = domToHtml([{ type: 'tag', name: 'div', children: [{ type: 'tag', name: 'p', children: [{ type: 'text', data: 'Hello World' }] }] }]);
Other packages similar to html-dom-parser
cheerio
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It provides a simpler API for parsing, manipulating, and rendering DOM structures. Compared to html-dom-parser, Cheerio offers a more jQuery-like syntax and additional manipulation capabilities, making it more suitable for complex DOM manipulation tasks.
jsdom
jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node.js. It simulates a web browser's environment, allowing you to interact with the DOM as if you were in the browser. jsdom is more comprehensive than html-dom-parser, providing a complete simulated browser environment, making it ideal for testing web pages and running web pages or applications in a Node.js environment.
html-dom-parser
HTML to DOM parser that works on both the server (Node.js) and the client (browser):
HTMLDOMParser(string[, options])
The parser converts an HTML string to a JavaScript object that describes the DOM tree.
Example
import parse from 'html-dom-parser';
parse('<p>Hello, World!</p>');
Output
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [
Text {
type: 'text',
parent: [Circular],
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
],
name: 'p',
attribs: {}
}
]
Replit | JSFiddle | Examples
Install
NPM:
npm install html-dom-parser --save
Yarn:
yarn add html-dom-parser
CDN:
<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
window.HTMLDOMParser();
</script>
Usage
Import with ES Modules:
import parse from 'html-dom-parser';
Require with CommonJS:
const parse = require('html-dom-parser').default;
Parse empty string:
parse('');
Output:
[]
Parse string:
parse('Hello, World!');
Output
[
Text {
type: 'text',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
]
Parse element with attributes:
parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');
Output
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [ [Text], [Element], [Text] ],
name: 'p',
attribs: { class: 'foo', style: 'color: #bada55' }
}
]
The server parser is a wrapper of htmlparser2 parseDOM
but with the root parent node excluded. The next section shows the available options you can use with the server parse.
The client parser mimics the server parser by using the DOM API to parse the HTML string.
Options (server only)
Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the following options:
const options = {
withStartIndices: false,
withEndIndices: false,
xmlMode: false,
xmlMode: false,
decodeEntities: true,
lowerCaseTags: true,
lowerCaseAttributeNames: true,
recognizeCDATA: false,
recognizeSelfClosing: false,
Tokenizer: Tokenizer,
};
If you're parsing SVG, you can set lowerCaseTags
to true
without having to enable xmlMode
. This will return all tag names in camelCase and not the HTML standard of lowercase.
[!NOTE]
If you're parsing code client-side (in-browser), you cannot control the parsing options. Client-side parsing automatically handles returning some HTML tags in camelCase, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.
Migration
v5
Migrated to TypeScript. CommonJS imports require the .default
key:
const parse = require('html-dom-parser').default;
v4
Upgraded htmlparser2 to v9.
v3
Upgraded domhandler to v5. Parser options like normalizeWhitespace
have been removed.
v2
Removed Internet Explorer (IE11) support.
v1
Upgraded domhandler
to v4 and htmlparser2
to v6.
Release
Release and publish are automated by Release Please.
Special Thanks
License
MIT