
Security News
libxml2 Maintainer Ends Embargoed Vulnerability Reports, Citing Unsustainable Burden
Libxml2’s solo maintainer drops embargoed security fixes, highlighting the burden on unpaid volunteers who keep critical open source software secure.
Simple and safety HTML/SVG cleaner to minify without changing its structure.
Since v3, the CLI was separated into htmlclean-cli.
Simple and safety HTML/SVG cleaner to minify without changing its structure.
For example, more than two whitespaces (even if those are divided by tags) in a line are reduced.
Before:
<p>The <strong> clean <span> <em> HTML is here. </em> </span> </strong> </p>
After:
<p>The <strong>clean <span><em>HTML is here.</em></span></strong></p>
The whitespace that was on the right side of the <strong>
was removed, and one on the left side was kept. And whitespaces on the both side of the <em>
were removed.
For example, unneeded whitespaces in path data of SVG are reduced. In the case of this SVG file, 4,784 bytes were reduced without changing its structure:
htmlclean removes following texts.
d
attribute of path
element, path
attribute of animateMotion
element, etc.)Following texts are protected (excluded from the Removing list).
textarea
, script
and style
elements, and text nodes in pre
elements<!--[if lt IE 7]>
)<!--[htmlclean-protect]-->
and <!--[/htmlclean-protect]-->
protect
optioncleanCode = htmlclean(sourceCode[, options])
require('htmlclean')
returns a Function. This Function accepts a HTML/SVG source code, and returns a clean HTML/SVG source code. You can specify an options
Object for second argument (see Options).
var htmlclean = require('htmlclean');
html = htmlclean(html);
// Or
html = require('htmlclean')(html);
You can specify an options
Object for second argument. This Object can have following properties.
protect
Type: RegExp or Array
Texts which are matched to this RegExp are protected in addition to the Protecting list. Multiple RegExps can be specified via an Array.
unprotect
Type: RegExp or Array
Texts which are matched to this RegExp are cleaned even if those text are included in the Protecting list. Multiple RegExps can be specified via an Array.
For example, a HTML source code as template in <script type="text/x-handlebars-template">
is cleaned via following code:
html = htmlclean(html, {
unprotect: /<script [^>]*\btype="text\/x-handlebars-template"[\s\S]+?<\/script>/ig
});
The x-handlebars-template
in the type
attribute above is a case of using Template Framework Handlebars. e.g. AngularJS requires ng-template
instead of it.
NOTE: The RegExp has to match to a text which is not a part of protected texts. For example, the RegExp matches a color: red;
in a <style>
element, but this is not cleaned because all texts in the <style>
element are protected. A color: red;
is a part of the protected text. The RegExp has to match to a text which is all of a <style>
element like /<style[\s\S]+?<\/style>/
.
edit
Type: Function
This Function more edits the HTML/SVG source code.
Protected texts are hidden from the HTML/SVG source code, and the HTML/SVG source code is passed to this Function. Therefore, this Function doesn't break the protected texts. The HTML/SVG source code which returned from this Function is restored.
NOTE: Markers \fID\x07
(\f
is "form feed" \x0C
code, \x07
is "bell", ID
is number) are inserted to the HTML/SVG source code instead of protected texts. This Function can remove those markers, but can't add new markers. (Invalid markers will be just removed.)
See a source HTML file and result HTML files in the examples
directory.
var htmlclean = require('htmlclean'),
fs = require('fs'),
htmlBefore = fs.readFileSync('./before.html', {encoding: 'utf8'});
var htmlAfter1 = htmlclean(htmlBefore);
fs.writeFileSync('./after1.html', htmlAfter1);
var htmlAfter2 = htmlclean(htmlBefore, {
protect: /<\!--%fooTemplate\b.*?%-->/g,
unprotect: /<script [^>]*\btype="text\/x-handlebars-template"[\s\S]+?<\/script>/ig,
edit: function(html) { return html.replace(/\begg(s?)\b/ig, 'omelet$1'); }
});
fs.writeFileSync('./after2.html', htmlAfter2);
htmlclean may not be able to parse malformed nested tags like <p>foo<pre>bar</p>baz</pre>
precisely. Also, close tags in script code such as <script>var foo = '</script>';</script>
, ?>
in PHP code, etc..
Some language parsers also mistake by those, then they recommend us to write code like '<' + '/script>'
. This is better even if htmlclean is not used.
htmlclean removes HTML/SVG comments that include SSI tags like <!-- Info for admin - Foo:<?= expression ?> -->
. I think it's no problem because htmlclean is used to minify HTML. If that SSI tag includes a important code for logic, use protect
option, or <!--[htmlclean-protect]-->
and <!--[/htmlclean-protect]-->
.
htmlclean never changes structure of document even if elements or attributes look like meaningless, because those might be used by your program, and the structuring is not job htmlclean should do. It should prevent unexpectedly breaking the data after all your efforts.
If you would like to enforce rules relating to code style, check out documents such as code style guide.
Also, htmlclean supposes valid HTML code. Since htmlclean never checks the syntax, it might not work correctly when wrong document was passed. (Also: Malformed Nested Tags, and Close Tags in Script)
If you want to control details of editing, HtmlCompressor, HTMLMinifier and others are better choice.
Thanks for images: Wikimedia Commons
FAQs
Simple and safety HTML/SVG cleaner to minify without changing its structure.
The npm package htmlclean receives a total of 4,705 weekly downloads. As such, htmlclean popularity was classified as popular.
We found that htmlclean demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Libxml2’s solo maintainer drops embargoed security fixes, highlighting the burden on unpaid volunteers who keep critical open source software secure.
Research
Security News
Socket investigates hidden protestware in npm packages that blocks user interaction and plays the Ukrainian anthem for Russian-language visitors.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.