What is sanitize-html?
The sanitize-html npm package is designed to clean up user-submitted HTML, preventing XSS attacks by sanitizing any HTML code input by users. It allows developers to specify a whitelist of HTML tags and attributes that are allowed, and it will strip out all other tags and attributes that are not explicitly allowed.
What are sanitize-html's main functionalities?
Sanitizing HTML
This feature allows you to remove any unwanted HTML tags and content that could lead to XSS attacks, leaving only the content that is deemed safe according to the specified rules.
const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<script>alert("XSS");</script><p>Valid content</p>';
const cleanHtml = sanitizeHtml(dirtyHtml);
console.log(cleanHtml); // Output: '<p>Valid content</p>'
Allowing a set of HTML tags
This feature lets you specify which HTML tags are allowed in the sanitized output, effectively filtering out all other tags that are not part of the whitelist.
const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<div><p>Some text</p><script>Bad script</script></div>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
allowedTags: ['div', 'p']
});
console.log(cleanHtml); // Output: '<div><p>Some text</p></div>'
Configuring allowed attributes for tags
This feature allows you to configure which attributes are allowed for specific tags, providing fine-grained control over the sanitization process.
const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<a href="http://example.com" onclick="stealCookies()">Link</a>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
allowedTags: ['a'],
allowedAttributes: {
'a': ['href']
}
});
console.log(cleanHtml); // Output: '<a href="http://example.com">Link</a>'
Transforming tags and attributes
This feature enables you to transform certain tags into other tags, or modify their attributes during the sanitization process.
const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<b>bold text</b>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
transformTags: {
'b': sanitizeHtml.simpleTransform('strong')
}
});
console.log(cleanHtml); // Output: '<strong>bold text</strong>'
Other packages similar to sanitize-html
dompurify
DOMPurify is a DOM-only XSS sanitizer for HTML, MathML, and SVG. It's similar to sanitize-html but works in a browser environment as well as server-side. It's also known for its speed and extensive configuration options.
xss
The xss package is another HTML sanitizer that aims to filter input from users to prevent XSS attacks. It provides a range of options for customization and is similar to sanitize-html in its goals, but it has a different API and set of defaults.
sanitize-html
sanitize-html
provides a simple HTML sanitizer with a clear API.
sanitize-html
is tolerant. It is well suited for cleaning up HTML fragments such as those created by ckeditor and other rich text editors. It is especially handy for removing unwanted CSS when copying and pasting from Word.
sanitize-html
allows you to specify the tags you want to permit, and the permitted attributes for each of those tags.
If a tag is not permitted, the contents of the tag are still kept, except for script tags.
The syntax of poorly closed p
and img
elements is cleaned up.
href
attributes are validated to ensure they only contain http
, https
, ftp
and mailto
URLs. Relative URLs are also allowed. Ditto for src
attributes.
HTML comments are not preserved.
Requirements
sanitize-html
is intended for use with Node. That's pretty much it. All of its npm dependencies are pure JavaScript. sanitize-html
is built on the excellent htmlparser2
module.
How to use
npm install sanitize-html
var sanitizeHtml = require('sanitize-html');
var dirty = 'some really tacky HTML';
var clean = sanitizeHtml(dirty);
That will allow our default list of allowed tags and attributes through. It's a nice set, but probably not quite what you want. So:
// Allow only a super restricted set of tags and attributes
clean = sanitizeHtml(dirty, {
allowedTags: [ 'b', 'i', 'em', 'strong', 'a' ],
allowedAttributes: {
'a': [ 'href' ]
}
});
Boom!
"I like your set but I want to add one more tag. Is there a convenient way?" Sure:
clean = sanitizeHtml(dirty, {
allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'img' ])
});
If you do not specify allowedTags
or allowedAttributes
our default list is applied. So if you really want an empty list, specify one.
"What are the default options?"
allowedTags: [ 'h3', 'h4', 'h5', 'h6', 'blockquote',
'p', 'a', 'ul', 'ol', 'nl', 'li', 'b', 'i', 'strong',
'em', 'strike', 'code', 'hr', 'br', 'div',
'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td',
'pre' ],
allowedAttributes: {
a: [ 'href', 'name', 'target' ],
// We don't currently allow img itself by default, but this
// would make sense if we did
img: [ 'src' ]
},
// Lots of these won't come up by default because
// we don't allow them
selfClosing: [ 'img', 'br', 'hr', 'area', 'base',
'basefont', 'input', 'link', 'meta' ]
Changelog
0.1.1: discard the text of script tags.
0.1.0: initial release.
About P'unk Avenue and Apostrophe
sanitize-html
was created at P'unk Avenue for use in Apostrophe, an open-source content management system built on node.js. If you like sanitize-html
you should definitely check out apostrophenow.org. Also be sure to visit us on github.
Support
Feel free to open issues on github.