normalize-html-whitespace
Safely remove repeating whitespace from HTML text.
Using \s
to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
\u00a0
or
(non-breaking space)\ufeff
or 
(zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680
or  
(Ogham space mark)\u180e
or ᠎
(Mongolian vowel separator)\u2000
or  
(en quad)\u2001
or  
(em quad)\u2002
or  
(en space)\u2003
or  
(em space)\u2004
or  
(three-per-em space)\u2005
or  
(four-per-em space)\u2006
or  
(six-per-em space)\u2007
or  
(figure space)\u2008
or  
(punctuation space)\u2009
or  
(thin space)\u200a
or  
(hair space)\u2028
or 

(line separator)\u2029
or 

(paragraph separator)\u202f
or  
(narrow non-breaking space)\u205f
or  
(medium mathematical space)\u3000
or  
(ideographic space)
For the sake of completeness, the following characters which are not part of \s
will also not be affected:
\u200b
or ​
(zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Installation
Node.js >= 8
is required. Type this at the command line:
npm install normalize-html-whitespace
Usage
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');