Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

simple-html-tokenizer

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

simple-html-tokenizer

Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.

0.5.10
Source
npm

Version published: 4 years ago

Weekly downloads: 71K; decreased by-79.79%

Maintainers: 6

Weekly downloads

Created: 10 years ago

What is simple-html-tokenizer?

The simple-html-tokenizer npm package is a lightweight library designed to tokenize HTML strings. It breaks down HTML content into a stream of tokens, which can be useful for parsing, analyzing, or transforming HTML documents.

What are simple-html-tokenizer's main functionalities?

Tokenizing HTML

This feature allows you to tokenize an HTML string into a series of tokens. The `tokenize` function takes an HTML string as input and returns an array of tokens representing the different parts of the HTML.

const { tokenize } = require('simple-html-tokenizer');
const html = '<div>Hello, <span>world!</span></div>';
const tokens = tokenize(html);
console.log(tokens);

Handling different token types

This feature demonstrates how to handle different types of tokens produced by the tokenizer. The tokens can be of various types such as 'StartTag', 'EndTag', and 'Chars', and this code sample shows how to process each type accordingly.

const { tokenize } = require('simple-html-tokenizer');
const html = '<div>Hello, <span>world!</span></div>';
const tokens = tokenize(html);
tokens.forEach(token => {
  switch (token.type) {
    case 'StartTag':
      console.log('Start tag:', token.tagName);
      break;
    case 'EndTag':
      console.log('End tag:', token.tagName);
      break;
    case 'Chars':
      console.log('Text:', token.chars);
      break;
    default:
      console.log('Other token:', token);
  }
});

Other packages similar to simple-html-tokenizer

Simple HTML Tokenizer

Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates. It can be used to preprocess templates to change the behavior of some template element depending upon whether the template element was found in an attribute or text.

It is not a full HTML5 tokenizer. It focuses on the kind of HTML that is used in templates: content designed to be inserted into the <body> and without <script> tags.

In particular, Simple HTML Tokenizer does not handle many states from the HTML5 Tokenizer Specification:

Any states involving CDATA or RCDATA
Any states involving <script>
Any states involving <DOCTYPE>
The bogus comment state

It also passes through character references, instead of trying to tokenize and process them, because the preprocessed templates will ultimately be parsed by a real browser context.

At the moment, there are some error states specified by the tokenizer spec that are not handled by Simple HTML Tokenizer. Ultimately, I plan to support all error states, as well as provide information about tokenizer errors in debug mode.

Usage

You can tokenize HTML:

var tokens = HTML5Tokenizer.tokenize("<div id='foo' href=bar class=\"bat\">");

var token = tokens[0];
token.tagName     //=> "div"
token.attributes  //=> [["id", "foo"], ["href", "bar"], ["class", "bat"]]
token.selfClosing //=> false

Building and running the tests

npm install
npm test

v0.5.10 (2020-10-14)

:bug: Bug Fix

#90 Add codemod mode to support <pre>\nhi</pre> stability in codemods. (@rwjblue)

:house: Internal

#92 Remove TravisCI setup. (@rwjblue)
#91 Add basic release automation. (@rwjblue)

Committers: 1

Robert Jackson (@rwjblue)

Keywords

FAQs

What is simple-html-tokenizer?

Is simple-html-tokenizer popular?

Is simple-html-tokenizer well maintained?

Package last updated on 14 Oct 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

simple-html-tokenizer

What is simple-html-tokenizer?

What are simple-html-tokenizer's main functionalities?

Other packages similar to simple-html-tokenizer

htmlparser2

parse5

html-tokenize

Usage

Building and running the tests

v0.5.10 (2020-10-14)

:bug: Bug Fix

:house: Internal

Committers: 1

Keywords

Related posts

Node.js Implements Stricter Policies for Semver-Major Pull Requests Ahead of Release Deadlines

Roblox Developers Targeted with npm Packages Infected with Skuld Infostealer and Blank Grabber

vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU