
Security News
Deno 2.4 Brings Back deno bundle, Improves Dependency Management and Observability
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.
tiny-html-lexer
Advanced tools
A tiny standard compliant HTML5 lexer/ chunker. Its small size should make it ideal for client side usage.
The chunker preserves all input characters, so it is suitable for building a syntax highlighter or html editor on top of it as well, if you like.
It is lazy/ on demand, so it does not unnecessarily buffer chunks.
I would love for someone to build a tiny template language with it. Feel free to contact me with any questions.
Simply, one top level function chunks
that returns an iterator.
let tinyhtml = require ('tiny-html-lexer')
let stream = tinyhtml.chunks ('<span>Hello, world</span>')
for (let chunk of stream)
console.log (chunk)
Alternatively, without for .. of
(should work just fine in ES5 environments):
let stream = tinyhtml.chunks ('<span>Hello, world</span>') .next ()
while (!stream.done) {
console.log (stream.value)
stream.next ()
}
Each call to next ()
mutates and returns the iterator object itself,
rather than the usual separate { value, done }
objects. It seems superfluous
to create new wrapper objects for each chunk, so I went with this instead.
Tokens are tuples (arrays) [type, chunk]
where type is one of
"attribute-name"
"attribute-equals"
"attribute-value-start"
"attribute-value-data"
"attribute-value-end"
"comment-start"
"comment-start-bogus"
"comment-data"
"comment-end"
"comment-end-bogus"
"startTag-start"
"endTag-start"
"tag-end"
"tag-end-autoclose"
"charRef-decimal"
"charRef-hex"
"charRef-named"
"unescaped"
"space"
"data"
"rcdata"
"rawtext"
"plaintext"
Doctype tokens are preserved, but are parsed as bogus comments rather than as doctype tokens.
CData (only used in svg/ foreign content) is likewise parsed as bogus comments.
The idea is that the lexical grammar can be very compactly expressed by a state machine that has transitions labeled with regular expressions rather than individual characters.
I am using regular expressions without capture groups for the transitions. For each state, the outgoing transitions are then wrapped in parentheses to create a capture group and then are all joined together as alternates in a single regular expression per state. When this regular expression is executed, one can then check which transition was taken by checking which index in the result of regex.exec is present.
MIT.
Enjoy!
FAQs
A tiny HTML5 lexer
The npm package tiny-html-lexer receives a total of 48 weekly downloads. As such, tiny-html-lexer popularity was classified as not popular.
We found that tiny-html-lexer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.
Security News
CVEForecast.org uses machine learning to project a record-breaking surge in vulnerability disclosures in 2025.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.