Socket
Socket
Sign inDemoInstall

htmljs-parser

Package Overview
Dependencies
Maintainers
2
Versions
109
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

htmljs-parser

An HTML parser recognizes content and string placeholders and allows JavaScript expressions as attribute values


Version published
Weekly downloads
6.3K
increased by20.62%
Maintainers
2
Weekly downloads
 
Created
Source

htmljs-parser

HTML parsers written according to the HTML spec will interpret all attribute values as strings which makes it challenging to properly describe a value's type (boolean, string, number, array, etc.) or to provide a complex JavaScript expression as a value. The ability to describe JavaScript expressions within attributes is important for HTML-based template compilers.

For example, consider a HTML-based template that wishes to support a custom tag named <say-hello> that supports an attribute named message that can be a string literal or a JavaScript expression.

Ideally, the template compiler should be able to handle any of the following:

<say-hello message="Hello world!" />
<say-hello message=("Hello " + personName + "!") />
<say-hello message="Hello ${personName}!" />

This parser extends the HTML grammar to add these important features:

  • JavaScript expressions as attribute values
<say-hello message=("Hello " + personName) count=2+2 large=true />
  • Placeholders in the content of an element
<div>
    Hello ${personName}
</div>
  • Placeholders within attribute value strings
<div data-message="Hello ${personName}!">
  • JavaScript flow-control statements within HTML elements
<div for(a in b) />
<div if(a === b) />
  • JavaScript flow-control statements as elements
<for (a in b)>
<if (a in b)>

Installation

npm install htmljs-parser

Usage

var htmljs = require('htmljs-parser');
var parser = htmljs.createParser({
    ontext: function(event) {
        // text
    },

    oncontentplaceholder: function(event) {
        // placeholder within content
    },

    onnestedcontentplaceholder: function(event) {
        // placeholder within string that is within content placeholder
    },

    onattributeplaceholder: function(event) {
        // placeholder within attribute
    },

    oncdata: function(event) {
        // CDATA
    },

    onopentag: function(event) {
        // open tag
    },

    onclosetag: function(event) {
        // close tag
    },

    ondtd: function(event) {
        // DTD (e.g. <DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">)
    },

    ondeclaration: function(event) {
        // Declaration (e.g. <?xml version="1.0" encoding="UTF-8" ?>)
    },

    oncomment: function(event) {
        // Text within XML comment
    },

    onerror: function(event) {
        // Error
    }
});

parser.parse(str);

Content Parsing Modes

The parser, by default, will look for HTML tags within content. This behavior might not be desirable for certain tags, so the parser allows the parsing mode to be changed (usually in response to an onopentag event).

There are three content parsing modes:

  • HTML Content (DEFAULT): The parser will look for any HTML tag and content placeholders while in this mode and parse opening and closing tags accordingly.

  • Parsed Text Content: The parser will look for the closing tag that matches the current open tag as well as content placeholders but all other content will be interpreted as text.

  • Static Text Content: The parser will look for the closing tag that matches the current open tag but all other content will be interpreted as raw text.

var htmljs = require('htmljs-parser');
var parser = htmljs.createParser({
    onopentag: function(event) {
        // open tag
        switch(event.tagName) {
            case 'textarea':
                //fall through
            case 'script':
                //fall through
            case 'style':
                // parse the content within these tags but only
                // look for placeholders and the closing tag.
                parser.enterParsedTextContentState();
                break;
            case 'dummy'
                // treat content within <dummy>...</dummy> as raw
                // text and ignore other tags and placeholders
                parser.enterStaticTextContentState();
                break;
            default:
                // The parser will switch to HTML content parsing mode
                // if the parsing mode is not explicitly changed by
                // "onopentag" function.
        }
    }
});

parser.parse(str);

Parsing Events

The htmljs-parser is an event-based parser which means that it will emit events as it is parsing the document. Events are emitted via calls to on<eventname> function which are supplied as properties in the options via call to require('htmljs-parser').createParser(options).

onopentag

The onopentag function will be called each time an opening tag is encountered.

EXAMPLE: Simple tag

INPUT:

<div>

OUTPUT EVENT:

{
    type: 'opentag',
    tagName: 'div',
    attributes: []
}

EXAMPLE: Tag with literal attribute values

INPUT:

<div class="demo" disabled=false data-number=123>

OUTPUT EVENT:

{
    type: 'opentag',
    tagName: 'div',
    attributes: [
        {
            name: 'class',
            expression: '"demo"',
            literalValue: 'demo'
        },
        {
            name: 'disabled',
            expression: 'false',
            literalValue: false
        },
        {
            name: 'data-number',
            expression: '123',
            literalValue: 123
        }
    ]
}

EXAMPLE: Tag with expression attribute

INPUT:

<say-something message="Hello "+data.name>

OUTPUT EVENT:

{
    type: 'opentag',
    tagName: 'div',
    attributes: [
        {
            name: 'message',
            expression: '"Hello "+data.name'
        }
    ]
}

EXAMPLE: Tag with an argument

INPUT:

<for(var i = 0; i < 10; i++)>

OUTPUT EVENT:

{
    type: 'opentag',
    tagName: 'for',
    argument: 'var i = 0; i < 10; i++',
    attributes: []
}

EXAMPLE: Attribute with an argument

INPUT:

<div if(x > y)>

OUTPUT EVENT:

{
    type: 'opentag',
    tagName: 'div',
    attributes: [
        {
            name: 'if',
            argument: 'x > y'
        }
    ]
}

onclosetag

The onclosetag function will be called each time a closing tag is encountered.

EXAMPLE: Simple close tag

INPUT:

</div>

OUTPUT EVENT:

{
    type: 'closetag',
    tagName: 'div'
}

ontext

The ontext function will be called each time within an element when textual data is encountered.

NOTE: Text within <![CDATA[ ]]> will be emitted via call to oncdata.

EXAMPLE

In the following example code, the TEXT sequences will be emitted as text events.

INPUT:

Simple text

OUTPUT EVENT:

{
    type: 'text',
    text: 'Simple text'
}

oncdata

The oncdata function will be called when text within <![CDATA[ ]]> is encountered.

EXAMPLE:

INPUT:

<![CDATA[This is text]]>

OUTPUT EVENT:

{
    type: 'cdata',
    text: 'This is text'
}

oncontentplaceholder

The oncontentplaceholder function will be called each time a placeholder is encountered within parsed textual content within elements.

If the placeholder starts with the $!{ sequence then event.escape will be false.

If the placeholder starts with the ${ sequence then event.escape will be true.

Text within <![CDATA[ ]]> and <!-- --> will not be parsed so you cannot use placeholders for these blocks of code.

EXAMPLE:

INPUT:

${"This is an escaped placeholder"}
$!{"This is a non-escaped placeholder"}

OUTPUT EVENTS

{
    type: 'contentplaceholder',
    expression: '"This is an escaped placeholder"',
    escape: true
}
{
    type: 'contentplaceholder',
    expression: '"This is a non-escaped placeholder"',
    escape: false
}

NOTE: The escape flag is merely informational. The application code is responsible for interpreting this flag to properly escape the expression.

onnestedcontentplaceholder

The onnestedcontentplaceholder function will be called each time a placeholder is encountered within a string that is also within another content placeholder.

If the placeholder starts with the $!{ sequence then event.escape will be false.

If the placeholder starts with the ${ sequence then event.escape will be true unless the placeholder is nested within another placeholder that is already escaped.

The event.expression property can be changed which will cause corresponding change to ancestor content placeholder expression.

Here's an example of modifying the expression based on the event.escape flag:

onnestedcontentplaceholder: function(event) {
    if (event.escape) {
        event.expression = 'escapeXml(' + event.expression + ')';
    }
}

EXAMPLE:

INPUT:

${"Hello ${data.name}"}

The ${data.name} sequence will trigger the call to onnestedcontentplaceholder.

OUTPUT EVENTS

{
    type: 'nestedcontentplaceholder',
    expression: 'data.name',
    escape: true
}
{
    type: 'contentplaceholder',
    expression: '"Hello "+(data.name)+"!"',
    escape: true
}

NOTE: The escape flag is merely informational. The application code is responsible for interpreting this flag to properly escape the expression.

onattributeplaceholder

The onattributeplaceholder function will be called each time a placeholder is encountered within an attribute string value. This event will be emitted before onopentag so by changing the expression property of the event, the resultant attribute can be changed.

Here's an example of modifying the expression based on the event.escape flag:

onattributeplaceholder: function(event) {
    if (event.escape) {
        event.expression = 'escapeAttr(' + event.expression + ')';
    }
}

If the placeholder starts with the $!{ sequence then event.escape will be false.

If the placeholder starts with the ${ sequence then event.escape will be true unless the placeholder is nested within another placeholder that is already escaped.

EXAMPLE:

INPUT:

<div class="${data.className}"><div>

OUTPUT EVENT:

{
    type: 'attributeplaceholder',
    expression: 'data.className',
    escape: true
}

NOTE: The escape flag is merely informational. The application code is responsible for interpreting this flag to properly escape the expression. The expression property can be altered by the onattributeplaceholder function and the attribute information emitted via onopentag will reflect this change.

ondtd

The ondtd function will be called when the document type declaration is encountered anywhere in the content.

EXAMPLE:

INPUT:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">

OUTPUT EVENT:

{
    type: 'dtd',
    dtd: 'DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"'
}

ondeclaration

The ondeclaration function will be called when an XML declaration is encountered anywhere in the content.

EXAMPLE:

INPUT:

<?xml version="1.0" encoding="UTF-8"?>

OUTPUT EVENT:

{
    type: 'declaration',
    declaration: 'xml version="1.0" encoding="UTF-8"'
}

oncomment

The oncomment function will be called when text within <!-- --> is encountered.

EXAMPLE:

INPUT:

<!--This is a comment-->

OUTPUT EVENT:

{
    type: 'comment',
    text: 'This is a comment'
}

onerror

The onerror function will be called when malformed content is detected. The most common cause for an error is due to reaching the end of the input while still parsing an open tag, close tag, XML comment, CDATA section, DTD, XML declaration, or placeholder.

Possible errors:

  • ILLEGAL_ELEMENT_ARGUMENT: Element can only have one argument
  • ILLEGAL_ATTRIBUTE_ARGUMENT: Attribute can only have one argument
  • MALFORMED_OPEN_TAG: EOF reached while parsing open tag
  • MALFORMED_CLOSE_TAG: EOF reached while parsing closing element
  • MALFORMED_CDATA: EOF reached while parsing CDATA
  • MALFORMED_PLACEHOLDER: EOF reached while parsing placeholder
  • MALFORMED_DTD: EOF reached while parsing DTD
  • MALFORMED_DECLARATION: EOF reached while parsing declaration
  • MALFORMED_COMMENT: EOF reached while parsing comment

EXAMPLE:

INPUT:

<a href="

OUTPUT EVENT:

{
    type: 'error',
    code: 'MALFORMED_OPEN_TAG',
    message: 'EOF reached while parsing open tag.',
    lineNumber: 1,
    startPos: 0,
    endPos: 9
}

Keywords

FAQs

Package last updated on 31 Dec 2015

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc