Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
simple-html-tokenizer
Advanced tools
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
The simple-html-tokenizer npm package is a lightweight library designed to tokenize HTML strings. It breaks down HTML content into a stream of tokens, which can be useful for parsing, analyzing, or transforming HTML documents.
Tokenizing HTML
This feature allows you to tokenize an HTML string into a series of tokens. The `tokenize` function takes an HTML string as input and returns an array of tokens representing the different parts of the HTML.
const { tokenize } = require('simple-html-tokenizer');
const html = '<div>Hello, <span>world!</span></div>';
const tokens = tokenize(html);
console.log(tokens);
Handling different token types
This feature demonstrates how to handle different types of tokens produced by the tokenizer. The tokens can be of various types such as 'StartTag', 'EndTag', and 'Chars', and this code sample shows how to process each type accordingly.
const { tokenize } = require('simple-html-tokenizer');
const html = '<div>Hello, <span>world!</span></div>';
const tokens = tokenize(html);
tokens.forEach(token => {
switch (token.type) {
case 'StartTag':
console.log('Start tag:', token.tagName);
break;
case 'EndTag':
console.log('End tag:', token.tagName);
break;
case 'Chars':
console.log('Text:', token.chars);
break;
default:
console.log('Other token:', token);
}
});
htmlparser2 is a fast and forgiving HTML/XML parser. It is more feature-rich compared to simple-html-tokenizer, offering a complete DOM structure and event-based parsing, which makes it suitable for more complex parsing tasks.
parse5 is a highly compliant HTML parser that produces a DOM tree. It is designed to be fully compatible with the HTML5 specification, making it more robust for handling modern web content compared to simple-html-tokenizer.
html-tokenize is another library for tokenizing HTML. It is similar in functionality to simple-html-tokenizer but offers a different API and may have different performance characteristics.
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates. It can be used to preprocess templates to change the behavior of some template element depending upon whether the template element was found in an attribute or text.
It is not a full HTML5 tokenizer. It focuses on the kind of HTML that is
used in templates: content designed to be inserted into the <body>
and without <script>
tags.
In particular, Simple HTML Tokenizer does not handle many states from the HTML5 Tokenizer Specification:
CDATA
or RCDATA
<script>
<DOCTYPE>
It also passes through character references, instead of trying to tokenize and process them, because the preprocessed templates will ultimately be parsed by a real browser context.
At the moment, there are some error states specified by the tokenizer spec that are not handled by Simple HTML Tokenizer. Ultimately, I plan to support all error states, as well as provide information about tokenizer errors in debug mode.
You can tokenize HTML:
var tokens = HTML5Tokenizer.tokenize("<div id='foo' href=bar class=\"bat\">");
var token = tokens[0];
token.tagName //=> "div"
token.attributes //=> [["id", "foo"], ["href", "bar"], ["class", "bat"]]
token.selfClosing //=> false
npm install
npm test
FAQs
Simple HTML Tokenizer is a lightweight JavaScript library that can be used to tokenize the kind of HTML normally found in templates.
The npm package simple-html-tokenizer receives a total of 0 weekly downloads. As such, simple-html-tokenizer popularity was classified as not popular.
We found that simple-html-tokenizer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 6 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.