
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
sanitize-dom
Advanced tools
Recursive sanitizer/filter to manipulate live WHATWG DOMs rather than HTML, for the browser and Node.js.
Direct DOM manipulation has gotten a bad reputation in the last decade of web development. From Ruby on Rails to React, the DOM was seen as something to gloriously destroy and re-render from the server or even from the browser. Never mind that the browser already exerted a lot of effort parsing HTML and constructing this tree! Mind-numbingly complex HTML string regular expression tests and manipulations had to deal with low-level details of the HTML syntax to insert, delete and change elements, sometimes on every keystroke! Contrasting to that, functions like createElement, remove and insertBefore from the DOM world were largely unknown and unused, except perhaps in jQuery.
Processing of HTML is destructive: The original DOM is destroyed and garbage collected with a certain time delay. Attached event handlers are detached and garbage collected. A completely new DOM is created from parsing new HTML set via .innerHTML =. Event listeners will have to be re-attached from the user-land (this is no issue when using on* HTML attributes, but this has disadvantages as well).
It doesn't have to be this way. Do not eliminate, but manipulate!
sanitize-dom crawls a DOM subtree (beginning from a given node, all the way down to its ancestral leaves) and filters and manipulates it non-destructively. This is very efficient: The browser doesn't have to re-render everything; it only re-renders what has been changed (sound familiar from React?).
The benefits of direct DOM manipulation:
Map or WeakMap) stay alive.sanitize-doms further advantages:
Aside from the browser, sanitize-dom can also be used in Node.js by supplying WHATWG DOM implementations like jsdom.
The test file describes additional usage patterns and features.
For the usage examples below, I'll use sanitizeHtml just to be able to illustrate the HTML output.
By default, all tags are 'flattened', i.e. only their inner text is kept:
sanitizeHtml(document, '<div><p>abc <b>def</b></p></div>');
"abc def"
Selective joining of same-tag siblings:
// Joins the two I tags.
sanitizeHtml(document, '<i>Hello</i> <i>world!</i> <em>Goodbye</em> <em>world!</em>', {
allow_tags_deep: { '.*': '.*' },
join_siblings: ['I'],
});
"<i>Hello world!</i> <em>Goodbye</em> <em>world!</em>"
Removal of redundant nested nodes (ubiquitous when using a WYSIWYG contenteditable editor):
sanitizeHtml(document, '<i><i>H<i></i>ello</i> <i>world! <i>Good<i>bye</i></i> world!</i>', {
allow_tags_deep: { '.*': '.*' },
flatten_tags_deep: { i: 'i' },
});
"<i>Hello world! Goodbye world!</i>"
Remove redundant empty tags:
sanitizeHtml(document, 'H<i></i>ello world!', {
allow_tags_deep: { '.*': '.*' },
remove_empty: true,
});
"Hello world!"
By default, all classes and attributes are removed:
// Keep all nodes, but remove all of their attributes and classes:
sanitizeHtml(document, '<div><p>abc <b class="green" data-type="test">def</b></p></div>', {
allow_tags_deep: { '.*': '.*' },
});
"<div><p>abc <b>def</b></p></div>"
Keep all nodes and all their attributes and classes:
sanitizeHtml(document, '<div><p class="red green">abc <b class="green" data-type="test">def</b></p></div>', {
allow_tags_deep: { '.*': '.*' },
allow_attributes_by_tag: { '.*': '.*' },
allow_classes_by_tag: { '.*': '.*' },
});
'<div><p class="red green">abc <b class="green" data-type="test">def</b></p></div>'
White-listing of classes and attributes:
// Keep only data- attributes and 'green' classes
sanitizeHtml(document, '<div><p class="red green">abc <b class="green" data-type="test">def</b></p></div>', {
allow_tags_deep: { '.*': '.*' },
allow_attributes_by_tag: { '.*': 'data-.*' },
allow_classes_by_tag: { '.*': 'green' },
});
'<div><p class="green">abc <b class="green" data-type="test">def</b></p></div>'
White-listing of node tags to keep:
// Keep only B tags anywhere in the document.
sanitizeHtml(document, '<i>abc</i> <b>def</b> <em>ghi</em>', {
allow_tags_deep: { '.*': '^b$' },
});
"abc <b>def</b> ghi"
// Keep only DIV children of BODY and I children of DIV.
sanitizeHtml(document, '<div> <i>abc</i> <em>def</em></div> <i>ghi</i>', {
allow_tags_direct: {
body: 'div',
div: '^i',
},
});
"<div> <i>abc</i> def</div> ghi"
Selective flattening of nodes:
// Flatten only EM children of DIV.
sanitizeHtml(document, '<div> <i>abc</i> <em>def</em></div> <i>ghi</i>', {
allow_tags_deep: { '.*': '.*' },
flatten_tags_direct: {
div: 'em',
},
});
"<div> <i>abc</i> def</div> <i>ghi</i>"
// Flatten I tags anywhere in the document.
sanitizeHtml(document, '<div> <i>abc</i> <em>def</em></div> <i>ghi</i>', {
allow_tags_deep: { '.*': '.*' },
flatten_tags_deep: {
'.*': '^i',
},
});
"<div> abc <em>def</em></div> ghi"
Selective removal of tags:
// Remove I children of DIVs.
sanitizeHtml(document, '<div> <i>abc</i> <em>def</em></div> <i>ghi</i>', {
allow_tags_deep: { '.*': '.*' },
remove_tags_direct: {
'div': 'i',
},
});
"<div> <em>def</em></div> <i>ghi</i>"
Then, sometimes there are more than one way to accomplish the same, as shown in this advanced example:
// Keep all tags except B, anywhere in the document. Two different solutions:
sanitizeHtml(document, '<div> <i>abc</i> <b>def</b> <em>ghi</em> </div>', {
allow_tags_deep: { '.*': '.*' },
flatten_tags_deep: { '.*': 'B' },
});
"<div> <i>abc</i> def <em>ghi</em> </div>"
sanitizeHtml(document, '<div> <i>abc</i> <b>def</b> <em>ghi</em> </div>', {
allow_tags_deep: { '.*': '^((?!b).)*$' }
});
"<div> <i>abc</i> def <em>ghi</em> </div>"
And finally, filter functions allow ultimate flexibility:
// change B node to EM node with contextual inner text; attach an event listener.
sanitizeHtml(document, '<p>abc <i><b>def</b> <b>ghi</b></i></p>', {
allow_tags_direct: {
'.*': '.*',
},
filters_by_tag: {
B: [
function changesToEm(node, { parentNodes, parentNodenames, siblingIndex }) {
const em = document.createElement('em');
const text = `${parentNodenames.join(', ')} - ${siblingIndex}`;
em.innerHTML = text;
em.addEventListener('click', () => alert(text));
return em;
},
],
},
});
// In a browser, the EM tags would be clickable and an alert box would pop up.
"<p>abc <i><em>I, P, BODY - 0</em> <em>I, P, BODY - 2</em></i></p>"
Run in Node.js:
npm test
For the browser, run:
cd sanitize-dom
npm i -g jspm@2.0.0-beta.7 http-server
jspm install @jspm/core@1.1.0
http-server
Then, in a browser which supports <script type="importmap"></script> (e.g. Google Chrome
version >= 81), browse to http://127.0.0.1:8080/test
Simple wrapper for sanitizeDom. Processes the node and its childNodes recursively.
Simple wrapper for sanitizeDom. Processes only the node's childNodes recursively, but not the node itself.
StringSimple wrapper for sanitizeDom. Instead of a DomNode, it takes an HTML string.
This function is not exported: Please use the wrapper functions instead:
sanitizeHtml, sanitizeNode, and sanitizeChildNodes.
Recursively processes a tree with node at the root.
In all descriptions, the term "flatten" means that a node is replaced with the node's childNodes.
For example, if the B node in <i>abc<b>def<u>ghi</u></b></i> is flattened, the result is
<i>abcdef<u>ghi</u></i>.
Each node is processed in the following sequence:
opts.filters_by_tag spec are called. If the filter returns null, the
node is removed and processing stops (see filters).opts.remove_tags_* spec matches, the node is removed and processing stops.opts.flatten_tags_* spec matches, the node is flattened and processing stops.opts.allow_tags_* spec matches:opts.allow_attributes_by_tag are removed.opts.allow_classes_by_tag are removed.ObjectImplements the WHATWG DOM Document interface.
In the browser, this is window.document. In Node.js, this may for example be
new JSDOM().window.document.
ObjectImplements the WHATWG DOM Node interface.
Custom properties for each node can be stored in a WeakMap passed as option nodePropertyMap
to one of the sanitize functions.
stringNode tag name.
Even though in the WHATWG DOM text nodes (nodeType 3) have a tag name #text,
these are referred to by the simpler string 'TEXT' for convenience.
stringA string which is compiled to a case-insensitive regular expression new RegExp(regex, 'i').
The regular expression is used to match a Tagname.
Object.<Regex, Array.<Regex>>Property names are matched against a (direct or ancestral) parent node's Tagname. Associated values are matched against the current nodes Tagname.
Object.<Regex, Array.<Regex>>Property names are matched against the current nodes Tagname. Associated values are used to match its attribute names.
Object.<Regex, Array.<Regex>>Property names are matched against the current nodes Tagname. Associated values are used to match its class names.
Object.<Regex, Array.<filter>>Property names are matched against node Tagnames. Associated values are the filters which are run on the node.
DomNode | Array.<DomNode> | nullFilter functions can either...
node is
replaced with the new node(s),null, in which case node is removed.Note that newly generated DomNode(s) are processed by running sanitizeDom on them, as if they had been part of the original tree. This has the following implication:
If a filter returns a newly generated DomNode with the same Tagname as node, it
would cause the same filter to be called again, which may lead to an infinite loop if the filter
is always returning the same result (this would be a badly behaved filter). To protect against
infinite loops, the author of the filter must acknowledge this circumstance by setting a boolean
property called 'skip_filters' for the DomNode) (in a WeakMap which the caller must
provide to one of the sanitize functions as the argument nodePropertyMap). If 'skip_filters' is
not set, an error is thrown. With well-behaved filters it is possible to continue subsequent
processing of the returned node without causing an infinite loop.
Simple wrapper for sanitizeDom. Processes the node and its childNodes recursively.
Kind: global function
| Param | Type | Default | Description |
|---|---|---|---|
| doc | DomDocument | ||
| node | DomNode | ||
| [opts] | Object | {} | |
| [nodePropertyMap] | WeakMap.<DomNode, Object> | new WeakMap() | Additional node properties |
Simple wrapper for sanitizeDom. Processes only the node's childNodes recursively, but not the node itself.
Kind: global function
| Param | Type | Default | Description |
|---|---|---|---|
| doc | DomDocument | ||
| node | DomNode | ||
| [opts] | Object | {} | |
| [nodePropertyMap] | WeakMap.<DomNode, Object> | new WeakMap() | Additional node properties |
StringSimple wrapper for sanitizeDom. Instead of a DomNode, it takes an HTML string.
Kind: global function
Returns: String - The processed HTML
| Param | Type | Default | Description |
|---|---|---|---|
| doc | DomDocument | ||
| html | string | ||
| [opts] | Object | {} | |
| [isDocument] | Boolean | false | Set this to true if you are passing an entire HTML document (beginning with the tag). The context node name will be HTML. If false, then the context node name will be BODY. |
| [nodePropertyMap] | WeakMap.<DomNode, Object> | new WeakMap() | Additional node properties |
This function is not exported: Please use the wrapper functions instead:
sanitizeHtml, sanitizeNode, and sanitizeChildNodes.
Recursively processes a tree with node at the root.
In all descriptions, the term "flatten" means that a node is replaced with the node's childNodes.
For example, if the B node in <i>abc<b>def<u>ghi</u></b></i> is flattened, the result is
<i>abcdef<u>ghi</u></i>.
Each node is processed in the following sequence:
opts.filters_by_tag spec are called. If the filter returns null, the
node is removed and processing stops (see filters).opts.remove_tags_* spec matches, the node is removed and processing stops.opts.flatten_tags_* spec matches, the node is flattened and processing stops.opts.allow_tags_* spec matches:
opts.allow_attributes_by_tag are removed.opts.allow_classes_by_tag are removed.Kind: global function
| Param | Type | Default | Description |
|---|---|---|---|
| doc | DomDocument | The document | |
| contextNode | DomNode | The root node | |
| [opts] | Object | {} | Options for processing. |
| [opts.filters_by_tag] | FilterSpec | {} | Matching filters are called with the node. |
| [opts.remove_tags_direct] | ParentChildSpec | {} | Matching nodes which are a direct child of the matching parent node are removed. |
| [opts.remove_tags_deep] | ParentChildSpec | {'.*': ['style','script','textarea','noscript']} | Matching nodes which are anywhere below the matching parent node are removed. |
| [opts.flatten_tags_direct] | ParentChildSpec | {} | Matching nodes which are a direct child of the matching parent node are flattened. |
| [opts.flatten_tags_deep] | ParentChildSpec | {} | Matching nodes which are anywhere below the matching parent node are flattened. |
| [opts.allow_tags_direct] | ParentChildSpec | {} | Matching nodes which are a direct child of the matching parent node are kept. |
| [opts.allow_tags_deep] | ParentChildSpec | {} | Matching nodes which are anywhere below the matching parent node are kept. |
| [opts.allow_attributes_by_tag] | TagAttributeNameSpec | {} | Matching attribute names of a matching node are kept. Other attributes are removed. |
| [opts.allow_classes_by_tag] | TagClassNameSpec | {} | Matching class names of a matching node are kept. Other class names are removed. If no class names are remaining, the class attribute is removed. |
| [opts.remove_empty] | boolean | false | Remove nodes which are completely empty |
| [opts.join_siblings] | Array.<Tagname> | [] | Join same-tag sibling nodes of given tag names, unless they are separated by non-whitespace textNodes. |
| [childrenOnly] | Bool | false | If false, then the node itself and its descendants are processed recursively. If true, then only the children and its descendants are processed recursively, but not the node itself (use when node is BODY or DocumentFragment). |
| [nodePropertyMap] | WeakMap.<DomNode, Object> | new WeakMap() | Additional properties for a DomNode can be stored in an object and will be looked up in this map. The properties of the object and their meaning: skip: If truthy, disables all processing for this node. skip_filters: If truthy, disables all filters for this node. skip_classes: If truthy, disables processing classes of this node. skip_attributes: If truthy, disables processing attributes of this node. See tests for usage details. |
ObjectImplements the WHATWG DOM Document interface.
In the browser, this is window.document. In Node.js, this may for example be
new JSDOM().window.document.
Kind: global typedef
See: https://dom.spec.whatwg.org/#interface-document
ObjectImplements the WHATWG DOM Node interface.
Custom properties for each node can be stored in a WeakMap passed as option nodePropertyMap
to one of the sanitize functions.
Kind: global typedef
See: https://dom.spec.whatwg.org/#interface-node
stringNode tag name.
Even though in the WHATWG DOM text nodes (nodeType 3) have a tag name #text,
these are referred to by the simpler string 'TEXT' for convenience.
Kind: global typedef
Example
'DIV'
'H1'
'TEXT'
stringA string which is compiled to a case-insensitive regular expression new RegExp(regex, 'i').
The regular expression is used to match a Tagname.
Kind: global typedef
Example
'.*' // matches any tag
'DIV' // matches DIV
'(DIV|H[1-3])' // matches DIV, H1, H2 and H3
'P' // matches P and SPAN
'^P$' // matches P but not SPAN
'TEXT' // matches text nodes (nodeType 3)
Object.<Regex, Array.<Regex>>Property names are matched against a (direct or ancestral) parent node's Tagname. Associated values are matched against the current nodes Tagname.
Kind: global typedef
Example
{
'(DIV|SPAN)': ['H[1-3]', 'B'], // matches H1, H2, H3 and B within DIV or SPAN
'STRONG': ['.*'] // matches all tags within STRONG
}
Object.<Regex, Array.<Regex>>Property names are matched against the current nodes Tagname. Associated values are used to match its attribute names.
Kind: global typedef
Example
{
'H[1-3]': ['id', 'class'], // matches 'id' and 'class' attributes of all H1, H2 and H3 nodes
'STRONG': ['data-.*'] // matches all 'data-.*' attributes of STRONG nodes.
}
Object.<Regex, Array.<Regex>>Property names are matched against the current nodes Tagname. Associated values are used to match its class names.
Kind: global typedef
Example
{
'DIV|SPAN': ['blue', 'red'] // matches 'blue' and 'red' class names of all DIV and SPAN nodes
}
Object.<Regex, Array.<filter>>Property names are matched against node Tagnames. Associated values are the filters which are run on the node.
DomNode | Array.<DomNode> | nullFilter functions can either...
node is
replaced with the new node(s),null, in which case node is removed.Note that newly generated DomNode(s) are processed by running sanitizeDom on them, as if they had been part of the original tree. This has the following implication:
If a filter returns a newly generated DomNode with the same Tagname as node, it
would cause the same filter to be called again, which may lead to an infinite loop if the filter
is always returning the same result (this would be a badly behaved filter). To protect against
infinite loops, the author of the filter must acknowledge this circumstance by setting a boolean
property called 'skip_filters' for the DomNode) (in a WeakMap which the caller must
provide to one of the sanitize functions as the argument nodePropertyMap). If 'skip_filters' is
not set, an error is thrown. With well-behaved filters it is possible to continue subsequent
processing of the returned node without causing an infinite loop.
Kind: global typedef
| Param | Type | Description |
|---|---|---|
| node | DomNode | Currently processed node |
| opts | Object | |
| opts.parents | Array.<DomNode> | The parent nodes of node. |
| opts.parentNodenames | Array.<Tagname> | The tag names of the parent nodes |
| opts.siblingIndex | Integer | The number of the current node amongst its siblings |
FAQs
Recursive sanitizer/filter for WHATWG DOMs
We found that sanitize-dom demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.