HTML cleaner and beautifier
Do you have crappy HTML? I do!
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td height="31"><b>Currently we have these articles available:</b>
<blockquote>
<p><a href="foo.html">The History of Foo</a><br />
An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
<p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
"Why the long face?"</p>
</blockquote>
</td>
</tr>
</table>
Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
Let's clean it up...
$ npm install clean-html
var cleaner = require('clean-html'),
fs = require('fs'),
file = process.argv[2];
fs.readFile(file, 'utf-8', function (err, data) {
cleaner.clean(data, function (html) {
console.log(html);
});
});
Sanity restored!
<table>
<tr>
<td>
<b>Currently we have these articles available:</b>
<blockquote>
<p>
<a href="foo.html">The History of Foo</a><br>
An <span>informative</span> piece of information.
</p>
<p>
<a href="bar.html">A Horse Walked Into a Bar</a><br>
The bartender said "Why the long face?"
</p>
</blockquote>
</td>
</tr>
</table>
Options
attr-to-remove
Attributes to remove from markup.
Type: Array
Default: ['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'disabled', 'height', 'target', 'valign', 'width']
block-tags
Block level element tags. Line breaks are added before and after, and nested content is indented.
Type: Array
Default: ['blockquote', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'p', 'table', 'td', 'tr']
Adds line breaks before and after comments.
Type: Boolean
Default: true
break-after-br
Adds line breaks after br tags.
Type: Boolean
Default: true
empty-tags
Empty element tags.
Type: Array
Default: ['br', 'hr', 'img']
indent
The string to use for indentation. e.g., a tab character or one or more spaces.
Type: String
Default: ' '
(two spaces)
Removes comments.
Type: Boolean
Default: false
remove-empty-paras
Removes empty paragraph tags.
Type: Boolean
Default: false
replace-nbsp
Replaces non-breaking white space entities (
) with regular spaces.
Type: Boolean
Default: false
tags-to-remove
Tags to remove from markup.
Type: Array
Default: ['center', 'font']
Adding values to option lists
These options are added for your convenience.
add-attr-to-remove
Additional attributes to remove from markup.
Type: Array
Default: null
add-block-tags
Additional block level element tags.
Type: Array
Default: null
add-empty-tags
Additional empty element tags.
Type: Array
Default: null
add-tags-to-remove
Additional tags to remove from markup.
Type: Array
Default: null