Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

sanitize-html

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

sanitize-html

Clean up user-submitted HTML, preserving whitelisted elements and whitelisted attributes on a per-element basis

1.1.8
Source
npm

Version published: 10 years ago

Weekly downloads: 985K; decreased by-59.63%

Maintainers: 10

Weekly downloads

Created: 11 years ago

What is sanitize-html?

The sanitize-html npm package is designed to clean up user-submitted HTML, preventing XSS attacks by sanitizing any HTML code input by users. It allows developers to specify a whitelist of HTML tags and attributes that are allowed, and it will strip out all other tags and attributes that are not explicitly allowed.

What are sanitize-html's main functionalities?

Sanitizing HTML

This feature allows you to remove any unwanted HTML tags and content that could lead to XSS attacks, leaving only the content that is deemed safe according to the specified rules.

const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<script>alert("XSS");</script><p>Valid content</p>';
const cleanHtml = sanitizeHtml(dirtyHtml);
console.log(cleanHtml); // Output: '<p>Valid content</p>'

Allowing a set of HTML tags

This feature lets you specify which HTML tags are allowed in the sanitized output, effectively filtering out all other tags that are not part of the whitelist.

const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<div><p>Some text</p><script>Bad script</script></div>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
  allowedTags: ['div', 'p']
});
console.log(cleanHtml); // Output: '<div><p>Some text</p></div>'

Configuring allowed attributes for tags

This feature allows you to configure which attributes are allowed for specific tags, providing fine-grained control over the sanitization process.

const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<a href="http://example.com" onclick="stealCookies()">Link</a>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
  allowedTags: ['a'],
  allowedAttributes: {
    'a': ['href']
  }
});
console.log(cleanHtml); // Output: '<a href="http://example.com">Link</a>'

Transforming tags and attributes

This feature enables you to transform certain tags into other tags, or modify their attributes during the sanitization process.

const sanitizeHtml = require('sanitize-html');
const dirtyHtml = '<b>bold text</b>';
const cleanHtml = sanitizeHtml(dirtyHtml, {
  transformTags: {
    'b': sanitizeHtml.simpleTransform('strong')
  }
});
console.log(cleanHtml); // Output: '<strong>bold text</strong>'

Other packages similar to sanitize-html

sanitize-html

sanitize-html provides a simple HTML sanitizer with a clear API.

sanitize-html is tolerant. It is well suited for cleaning up HTML fragments such as those created by ckeditor and other rich text editors. It is especially handy for removing unwanted CSS when copying and pasting from Word.

sanitize-html allows you to specify the tags you want to permit, and the permitted attributes for each of those tags.

If a tag is not permitted, the contents of the tag are still kept, except for script and style tags.

The syntax of poorly closed p and img elements is cleaned up.

href attributes are validated to ensure they only contain http, https, ftp and mailto URLs. Relative URLs are also allowed. Ditto for src attributes.

HTML comments are not preserved.

Requirements

sanitize-html is intended for use with Node. That's pretty much it. All of its npm dependencies are pure JavaScript. sanitize-html is built on the excellent htmlparser2 module.

How to use

npm install sanitize-html

var sanitizeHtml = require('sanitize-html');

var dirty = 'some really tacky HTML';
var clean = sanitizeHtml(dirty);

That will allow our default list of allowed tags and attributes through. It's a nice set, but probably not quite what you want. So:

// Allow only a super restricted set of tags and attributes
clean = sanitizeHtml(dirty, {
  allowedTags: [ 'b', 'i', 'em', 'strong', 'a' ],
  allowedAttributes: {
    'a': [ 'href' ]
  }
});

Boom!

"I like your set but I want to add one more tag. Is there a convenient way?" Sure:

clean = sanitizeHtml(dirty, {
  allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'img' ])
});

If you do not specify allowedTags or allowedAttributes our default list is applied. So if you really want an empty list, specify one.

"What are the default options?"

allowedTags: [ 'h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol', 'nl', 'li', 'b', 'i', 'strong', 'em', 'strike', 'code', 'hr', 'br', 'div', 'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td', 'pre' ],
allowedAttributes: {
  a: [ 'href', 'name', 'target' ],
  // We don't currently allow img itself by default, but this
  // would make sense if we did
  img: [ 'src' ]
},
// Lots of these won't come up by default because we don't allow them
selfClosing: [ 'img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta' ],
// URL schemes we permit
allowedSchemes: [ 'http', 'https', 'ftp', 'mailto' ]

Transformations

What if you want to add or change an attribute? What if you want to transform one tag to another? No problem, it's simple!

The easiest way (will change all ol tags to ul tags):

clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': 'ul',
  }
});

The most advanced usage:

clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': function(tagName, attribs) {
        // My own custom magic goes here

        return {
            tagName: 'ul',
            attribs: {
                class: 'foo'
            }
        };
    }
  }
});

There is also a helper method which should be enough for simple cases in which you want to change the tag and/or add some attributes:

clean = sanitizeHtml(dirty, {
  transformTags: {
    'ol': sanitizeHtml.simpleTransform('ul', {class: 'foo'}),
  }
});

The simpleTransform helper method has 3 parameters:

simpleTransform(newTag, newAttributes, shouldMerge)

The last parameter (shouldMerge) is set to true by default. When true, simpleTransform will merge the current attributes with the new ones (newAttributes). When false, all existing attributes are discarded.

Filters

You can provide a filter function to remove unwanted tags. Let's suppose we need to remove empty a tags like:

<a href="page.html"></a>

We can do that with the following filter:

sanitizeHtml(
    '<p>This is <a href="http://www.linux.org"></a><br/>Linux</p>',
    {
        exclusiveFilter: function(frame) {
            return frame.tag === 'a' && !frame.text.trim();
        }
    }
);

Allowed URL schemes

By default we allow the following URL schemes in cases where href, src, etc. are allowed:

[ 'http', 'https', 'ftp', 'mailto' ]

You can override this if you want to:

sanitizeHtml(
  // teeny-tiny valid transparent GIF in a data URL
  '<img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" />',
  {
    allowedTags: [ 'img', 'p' ],
    allowedSchemes: [ 'data', 'http' ]
  }
);

Changelog

1.1.6: allowedSchemes option for those who want to permit data URLs and such.

1.1.5: just a packaging thing.

1.1.4: custom exclusion filter.

1.1.3: moved to lodash. 1.1.2 pointed to the wrong version of lodash.

1.1.0: the transformTags option was added. Thanks to kl3ryk.

1.0.3: fixed several more javascript URL attack vectors after studying the XSS filter evasion cheat sheet to better understand my enemy. Whitespace characters (codes from 0 to 32), which browsers ignore in URLs in certain cases allowing the "javascript" scheme to be snuck in, are now stripped out when checking for naughty URLs. Thanks again to pinpickle.

1.0.2: fixed a javascript URL attack vector. naughtyHref must entity-decode URLs and also check for mixed-case scheme names. Thanks to pinpickle.

1.0.1: Doc tweaks.

1.0.0: If the style tag is disallowed, then its content should be dumped, so that it doesn't appear as text. We were already doing this for script tags, however in both cases the content is now preserved if the tag is explicitly allowed.

We're rocking our tests and have been working great in production for months, so: declared 1.0.0 stable.

0.1.3: do not double-escape entities in attributes or text. Turns out the "text" provided by htmlparser2 is already escaped.

0.1.2: packaging error meant it wouldn't install properly.

0.1.1: discard the text of script tags.

0.1.0: initial release.

About P'unk Avenue and Apostrophe

sanitize-html was created at P'unk Avenue for use in Apostrophe, an open-source content management system built on node.js. If you like sanitize-html you should definitely check out apostrophenow.org. Also be sure to visit us on github.

Support

Feel free to open issues on github.

Keywords

FAQs

What is sanitize-html?

Is sanitize-html popular?

Is sanitize-html well maintained?

Package last updated on 19 Jun 2014

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

sanitize-html

What is sanitize-html?

What are sanitize-html's main functionalities?

Other packages similar to sanitize-html

dompurify

xss

sanitize-html

Requirements

How to use

Transformations

Filters

Allowed URL schemes

Changelog

About P'unk Avenue and Apostrophe

Support

Keywords

Related posts

Node.js Implements Stricter Policies for Semver-Major Pull Requests Ahead of Release Deadlines

Roblox Developers Targeted with npm Packages Infected with Skuld Infostealer and Blank Grabber

vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU