
Research
TeamPCP Compromises Telnyx Python SDK to Deliver Credential-Stealing Malware
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.
html-dom-parser
Advanced tools
HTML to DOM parser that works on both the server (Node.js) and the client (browser):
HTMLDOMParser(string[, options])
The parser converts an HTML string to a JavaScript object that describes the DOM tree.
For example:
import parse from 'html-dom-parser';
parse('<p>Hello, World!</p>');
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [
Text {
type: 'text',
parent: [Circular],
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
],
name: 'p',
attribs: {}
}
]
NPM:
npm install html-dom-parser --save
Yarn:
yarn add html-dom-parser
CDN:
<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
window.HTMLDOMParser(/* string */);
</script>
Import with ES Modules:
import parse from 'html-dom-parser';
Require with CommonJS:
const parse = require('html-dom-parser').default;
Parse empty string:
parse('');
Output:
[]
Parse string:
parse('Hello, World!');
[
Text {
type: 'text',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
data: 'Hello, World!'
}
]
Parse element with attributes:
parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');
[
Element {
type: 'tag',
parent: null,
prev: null,
next: null,
startIndex: null,
endIndex: null,
children: [ [Text], [Element], [Text] ],
name: 'p',
attribs: { class: 'foo', style: 'color: #bada55' }
}
]
The server parser is a wrapper of htmlparser2 parseDOM but with the root parent node excluded. The next section shows the available options you can use with the server parse.
The client parser mimics the server parser by using the DOM API to parse the HTML string.
Because the server parser is a wrapper of htmlparser2, which implements domhandler, you can alter how the server parser parses your code with the options:
export interface ParserOptions {
/**
* Indicates whether special tags (`<script>`, `<style>`, and `<title>`) should get special treatment
* and if "empty" tags (eg. `<br>`) can have children. If `false`, the content of special tags
* will be text only. For feeds and other XML content (documents that don't consist of HTML),
* set this to `true`.
*
* @default false
*/
xmlMode?: boolean;
/**
* Decode entities within the document.
*
* @default true
*/
decodeEntities?: boolean;
/**
* If set to true, all tags will be lowercased.
*
* @default !xmlMode
*/
lowerCaseTags?: boolean;
/**
* If set to `true`, all attribute names will be lowercased. This has noticeable impact on speed.
*
* @default !xmlMode
*/
lowerCaseAttributeNames?: boolean;
/**
* If set to true, CDATA sections will be recognized as text even if the xmlMode option is not enabled.
* NOTE: If xmlMode is set to `true` then CDATA sections will always be recognized as text.
*
* @default xmlMode
*/
recognizeCDATA?: boolean;
/**
* If set to `true`, self-closing tags will trigger the onclosetag event even if xmlMode is not set to `true`.
* NOTE: If xmlMode is set to `true` then self-closing tags will always be recognized.
*
* @default xmlMode
*/
recognizeSelfClosing?: boolean;
/**
* Allows the default tokenizer to be overwritten.
*/
Tokenizer?: typeof Tokenizer;
}
If you're parsing SVG, you can set lowerCaseTags to true without having to enable xmlMode. This will return all tag names in camelCase and not the HTML standard of lowercase.
[!NOTE] If you're parsing code client-side (in-browser), you cannot control the parsing options. Client-side parsing automatically handles returning some HTML tags in camelCase, such as specific SVG elements, but returns all other tags lowercased according to the HTML standard.
Migrated to TypeScript. CommonJS imports require the .default key:
const parse = require('html-dom-parser').default;
Upgraded htmlparser2 to v9.
Upgraded domhandler to v5. Parser options like normalizeWhitespace have been removed.
Removed Internet Explorer (IE11) support.
Upgraded domhandler to v4 and htmlparser2 to v6.
Release and publish are automated by Release Please.
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It provides a simpler API for parsing, manipulating, and rendering DOM structures. Compared to html-dom-parser, Cheerio offers a more jQuery-like syntax and additional manipulation capabilities, making it more suitable for complex DOM manipulation tasks.
jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node.js. It simulates a web browser's environment, allowing you to interact with the DOM as if you were in the browser. jsdom is more comprehensive than html-dom-parser, providing a complete simulated browser environment, making it ideal for testing web pages and running web pages or applications in a Node.js environment.
FAQs
HTML to DOM parser.
The npm package html-dom-parser receives a total of 2,403,454 weekly downloads. As such, html-dom-parser popularity was classified as popular.
We found that html-dom-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.

Security News
TeamPCP is partnering with ransomware group Vect to turn open source supply chain attacks on tools like Trivy and LiteLLM into large-scale ransomware operations.

Security News
/Research
Widespread GitHub phishing campaign uses fake Visual Studio Code security alerts in Discussions to trick developers into visiting malicious website.