Security News
The Unpaid Backbone of Open Source: Solo Maintainers Face Increasing Security Demands
Solo open source maintainers face burnout and security challenges, with 60% unpaid and 60% considering quitting.
htmljs-parser
Advanced tools
An HTML parser recognizes content and string placeholders and allows JavaScript expressions as attribute values
HTML parsers written according to the HTML spec will interpret all attribute values as strings which makes it challenging to properly describe a value's type (boolean, string, number, array, etc.) or to provide a complex JavaScript expression as a value. The ability to describe JavaScript expressions within attributes is important for HTML-based template compilers.
For example, consider a HTML-based template that wishes to
support a custom tag named <say-hello>
that supports an
attribute named message
that can be a string literal or a JavaScript expression.
Ideally, the template compiler should be able to handle any of the following:
<say-hello message="Hello world!" />
<say-hello message=("Hello " + personName + "!") />
<say-hello message="Hello ${personName}!" />
This parser extends the HTML grammar to add these important features:
<say-hello message=("Hello " + personName) count=2+2 large=true />
<div>
Hello ${personName}
</div>
<div data-message="Hello ${personName}!">
<div for(a in b) />
<div if(a === b) />
<for (a in b)>
<if (a in b)>
npm install htmljs-parser
var htmljs = require('htmljs-parser');
var parser = htmljs.createParser({
ontext: function(event) {
// text
},
oncontentplaceholder: function(event) {
// placeholder within content
},
onnestedcontentplaceholder: function(event) {
// placeholder within string that is within content placeholder
},
onattributeplaceholder: function(event) {
// placeholder within attribute
},
oncdata: function(event) {
// CDATA
},
onopentag: function(event) {
// open tag
},
onclosetag: function(event) {
// close tag
},
ondtd: function(event) {
// DTD (e.g. <DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">)
},
ondeclaration: function(event) {
// Declaration (e.g. <?xml version="1.0" encoding="UTF-8" ?>)
},
oncomment: function(event) {
// Text within XML comment
},
onerror: function(event) {
// Error
}
});
parser.parse(str);
The parser, by default, will look for HTML tags within content. This behavior
might not be desirable for certain tags, so the parser allows the parsing mode
to be changed (usually in response to an onopentag
event).
There are three content parsing modes:
HTML Content (DEFAULT): The parser will look for any HTML tag and content placeholders while in this mode and parse opening and closing tags accordingly.
Parsed Text Content: The parser will look for the closing tag that matches the current open tag as well as content placeholders but all other content will be interpreted as text.
Static Text Content: The parser will look for the closing tag that matches the current open tag but all other content will be interpreted as raw text.
var htmljs = require('htmljs-parser');
var parser = htmljs.createParser({
onopentag: function(event) {
// open tag
switch(event.tagName) {
case 'textarea':
//fall through
case 'script':
//fall through
case 'style':
// parse the content within these tags but only
// look for placeholders and the closing tag.
parser.enterParsedTextContentState();
break;
case 'dummy'
// treat content within <dummy>...</dummy> as raw
// text and ignore other tags and placeholders
parser.enterStaticTextContentState();
break;
default:
// The parser will switch to HTML content parsing mode
// if the parsing mode is not explicitly changed by
// "onopentag" function.
}
}
});
parser.parse(str);
The htmljs-parser
is an event-based parser which means that it will emit
events as it is parsing the document. Events are emitted via calls
to on<eventname>
function which are supplied as properties in the options
via call to require('htmljs-parser').createParser(options)
.
The onopentag
function will be called each time an opening tag is
encountered.
EXAMPLE: Simple tag
INPUT:
<div>
OUTPUT EVENT:
{
type: 'opentag',
tagName: 'div',
attributes: []
}
EXAMPLE: Tag with literal attribute values
INPUT:
<div class="demo" disabled=false data-number=123>
OUTPUT EVENT:
{
type: 'opentag',
tagName: 'div',
attributes: [
{
name: 'class',
expression: '"demo"',
literalValue: 'demo'
},
{
name: 'disabled',
expression: 'false',
literalValue: false
},
{
name: 'data-number',
expression: '123',
literalValue: 123
}
]
}
EXAMPLE: Tag with expression attribute
INPUT:
<say-something message="Hello "+data.name>
OUTPUT EVENT:
{
type: 'opentag',
tagName: 'div',
attributes: [
{
name: 'message',
expression: '"Hello "+data.name'
}
]
}
EXAMPLE: Tag with an argument
INPUT:
<for(var i = 0; i < 10; i++)>
OUTPUT EVENT:
{
type: 'opentag',
tagName: 'for',
argument: 'var i = 0; i < 10; i++',
attributes: []
}
EXAMPLE: Attribute with an argument
INPUT:
<div if(x > y)>
OUTPUT EVENT:
{
type: 'opentag',
tagName: 'div',
attributes: [
{
name: 'if',
argument: 'x > y'
}
]
}
The onclosetag
function will be called each time a closing tag is
encountered.
EXAMPLE: Simple close tag
INPUT:
</div>
OUTPUT EVENT:
{
type: 'closetag',
tagName: 'div'
}
The ontext
function will be called each time within an element
when textual data is encountered.
NOTE: Text within <![CDATA[
]]>
will be emitted via call
to oncdata
.
EXAMPLE
In the following example code, the TEXT
sequences will be emitted as
text events.
INPUT:
Simple text
OUTPUT EVENT:
{
type: 'text',
text: 'Simple text'
}
The oncdata
function will be called when text within <![CDATA[
]]>
is encountered.
EXAMPLE:
INPUT:
<![CDATA[This is text]]>
OUTPUT EVENT:
{
type: 'cdata',
text: 'This is text'
}
The oncontentplaceholder
function will be called each time a placeholder
is encountered within parsed textual content within elements.
If the placeholder starts with the $!{
sequence then event.escape
will be false
.
If the placeholder starts with the ${
sequence then event.escape
will be
true
.
Text within <![CDATA[
]]>
and <!--
-->
will not be parsed so you
cannot use placeholders for these blocks of code.
EXAMPLE:
INPUT:
${"This is an escaped placeholder"}
$!{"This is a non-escaped placeholder"}
OUTPUT EVENTS
{
type: 'contentplaceholder',
expression: '"This is an escaped placeholder"',
escape: true
}
{
type: 'contentplaceholder',
expression: '"This is a non-escaped placeholder"',
escape: false
}
NOTE:
The escape
flag is merely informational. The application code is responsible
for interpreting this flag to properly escape the expression.
The onnestedcontentplaceholder
function will be called each time a placeholder
is encountered within a string that is also within another content placeholder.
If the placeholder starts with the $!{
sequence then event.escape
will be false
.
If the placeholder starts with the ${
sequence then event.escape
will be
true
unless the placeholder is nested within another placeholder that is
already escaped.
The event.expression
property can be changed which will cause corresponding
change to ancestor content placeholder expression.
Here's an example of modifying the expression based on the event.escape
flag:
onnestedcontentplaceholder: function(event) {
if (event.escape) {
event.expression = 'escapeXml(' + event.expression + ')';
}
}
EXAMPLE:
INPUT:
${"Hello ${data.name}"}
The ${data.name}
sequence will trigger the call to
onnestedcontentplaceholder
.
OUTPUT EVENTS
{
type: 'nestedcontentplaceholder',
expression: 'data.name',
escape: true
}
{
type: 'contentplaceholder',
expression: '"Hello "+(data.name)+"!"',
escape: true
}
NOTE:
The escape
flag is merely informational. The application code is responsible
for interpreting this flag to properly escape the expression.
The onattributeplaceholder
function will be called each time a placeholder
is encountered within an attribute string value. This event will be emitted
before onopentag
so by changing the expression
property of the event,
the resultant attribute can be changed.
Here's an example of modifying the expression based on the event.escape
flag:
onattributeplaceholder: function(event) {
if (event.escape) {
event.expression = 'escapeAttr(' + event.expression + ')';
}
}
If the placeholder starts with the $!{
sequence then event.escape
will be false
.
If the placeholder starts with the ${
sequence then event.escape
will be
true
unless the placeholder is nested within another placeholder that is
already escaped.
EXAMPLE:
INPUT:
<div class="${data.className}"><div>
OUTPUT EVENT:
{
type: 'attributeplaceholder',
expression: 'data.className',
escape: true
}
NOTE:
The escape
flag is merely informational. The application code is responsible
for interpreting this flag to properly escape the expression. The expression
property can be altered by the onattributeplaceholder
function and the
attribute information emitted via onopentag
will reflect this change.
The ondtd
function will be called when the document type declaration
is encountered anywhere in the content.
EXAMPLE:
INPUT:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">
OUTPUT EVENT:
{
type: 'dtd',
dtd: 'DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"'
}
The ondeclaration
function will be called when an XML declaration
is encountered anywhere in the content.
EXAMPLE:
INPUT:
<?xml version="1.0" encoding="UTF-8"?>
OUTPUT EVENT:
{
type: 'declaration',
declaration: 'xml version="1.0" encoding="UTF-8"'
}
The oncomment
function will be called when text within <!--
-->
is encountered.
EXAMPLE:
INPUT:
<!--This is a comment-->
OUTPUT EVENT:
{
type: 'comment',
text: 'This is a comment'
}
The onerror
function will be called when malformed content is detected.
The most common cause for an error is due to reaching the end of the
input while still parsing an open tag, close tag, XML comment, CDATA section,
DTD, XML declaration, or placeholder.
Possible errors:
ILLEGAL_ELEMENT_ARGUMENT
: Element can only have one argumentILLEGAL_ATTRIBUTE_ARGUMENT
: Attribute can only have one argumentMALFORMED_OPEN_TAG
: EOF reached while parsing open tagMALFORMED_CLOSE_TAG
: EOF reached while parsing closing elementMALFORMED_CDATA
: EOF reached while parsing CDATAMALFORMED_PLACEHOLDER
: EOF reached while parsing placeholderMALFORMED_DTD
: EOF reached while parsing DTDMALFORMED_DECLARATION
: EOF reached while parsing declarationMALFORMED_COMMENT
: EOF reached while parsing commentEXAMPLE:
INPUT:
<a href="
OUTPUT EVENT:
{
type: 'error',
code: 'MALFORMED_OPEN_TAG',
message: 'EOF reached while parsing open tag.',
lineNumber: 1,
startPos: 0,
endPos: 9
}
FAQs
An HTML parser recognizes content and string placeholders and allows JavaScript expressions as attribute values
The npm package htmljs-parser receives a total of 5,269 weekly downloads. As such, htmljs-parser popularity was classified as popular.
We found that htmljs-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 5 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Solo open source maintainers face burnout and security challenges, with 60% unpaid and 60% considering quitting.
Security News
License exceptions modify the terms of open source licenses, impacting how software can be used, modified, and distributed. Developers should be aware of the legal implications of these exceptions.
Security News
A developer is accusing Tencent of violating the GPL by modifying a Python utility and changing its license to BSD, highlighting the importance of copyleft compliance.