html-dom-parser
HTML to DOM parser that works on both the server (Node.js) and the client (browser):
HTMLDOMParser(string[, options])
The parser converts an HTML string to a JavaScript object that describes the DOM tree.
Example
const parse = require('html-dom-parser');
parse('<p>Hello, World!</p>');
Output:
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [
Text {
type: 'text',
parent: [Circular],
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
],
name: 'p',
attribs: {}
}
]
Replit | JSFiddle | Examples
Install
NPM:
npm install html-dom-parser --save
Yarn:
yarn add html-dom-parser
CDN:
<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
window.HTMLDOMParser();
</script>
Usage
Import or require the module:
import parse from 'html-dom-parser';
const parse = require('html-dom-parser');
Parse empty string:
parse('');
Output:
[]
Parse string:
parse('Hello, World!');
[
Text {
type: 'text',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
]
Parse element with attributes:
parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');
Output:
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [ [Text], [Element], [Text] ],
name: 'p',
attribs: { class: 'foo', style: 'color: #bada55' }
}
]
The server parser is a wrapper of htmlparser2 parseDOM
but with the root parent node excluded. The next section shows the available options you can use with the server parse.
The client parser mimics the server parser by using the DOM API to parse the HTML string.
Options (server only)
Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the following options:
const options = {
withStartIndices: false,
withEndIndices: false,
xmlMode: false,
xmlMode: false,
decodeEntities: true,
lowerCaseTags: true,
lowerCaseAttributeNames: true,
recognizeCDATA: false,
recognizeSelfClosing: false,
Tokenizer: Tokenizer
};
If you are parsing HTML with SVG code you can set lowerCaseTags
to true
without having to enable xmlMode
. Keep in mind this will return all tag names in camel-case and not the HTML standard of lowercase.
Note: If you are parsing code client-side (in-browser), you can not control the parsing options. Client-side parsing automatically handles returning some HTML tags in camel-case, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.
Testing
Run server and client tests:
npm test
Generate HTML coverage report for server tests:
npx nyc report --reporter=html
Lint files:
npm run lint
npm run lint:fix
Test TypeScript declaration file for style and correctness:
npm run lint:dts
Migration
v3.0.0
domhandler has been upgraded to v5 so some parser options like normalizeWhitespace
have been removed.
Security contact information
To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.
Release
Release and publish are automated by Release Please.
Special Thanks
License
MIT