What is @web/parse5-utils?
@web/parse5-utils is a utility library for working with the parse5 HTML parser. It provides a set of helper functions to manipulate and query the parse5 AST (Abstract Syntax Tree) more easily.
What are @web/parse5-utils's main functionalities?
Querying Elements
This feature allows you to query elements from the parse5 AST using CSS selectors. In the example, it queries all <p> elements from the parsed HTML.
const { parse } = require('parse5');
const { queryAll } = require('@web/parse5-utils');
const html = '<div><p>Hello</p><p>World</p></div>';
const document = parse(html);
const paragraphs = queryAll(document, 'p');
console.log(paragraphs.length); // 2
Manipulating Elements
This feature allows you to manipulate elements within the parse5 AST. In the example, it appends a new <p> element with text 'Hello World' to an existing <div>.
const { parse, serialize } = require('parse5');
const { append, createElement } = require('@web/parse5-utils');
const html = '<div></div>';
const document = parse(html);
const div = document.childNodes[0].childNodes[1];
const newElement = createElement('p', {}, 'Hello World');
append(div, newElement);
console.log(serialize(document)); // <html><head></head><body><div><p>Hello World</p></div></body></html>
Removing Elements
This feature allows you to remove elements from the parse5 AST. In the example, it removes the first <p> element from the parsed HTML.
const { parse, serialize } = require('parse5');
const { remove } = require('@web/parse5-utils');
const html = '<div><p>Hello</p><p>World</p></div>';
const document = parse(html);
const paragraph = document.childNodes[0].childNodes[1].childNodes[0];
remove(paragraph);
console.log(serialize(document)); // <html><head></head><body><div><p>World</p></div></body></html>
Other packages similar to @web/parse5-utils
cheerio
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It parses HTML and XML and provides a jQuery-like API for manipulating the resulting DOM. Compared to @web/parse5-utils, Cheerio offers a more familiar API for those used to jQuery, but it is not based on parse5.
jsdom
jsdom is a JavaScript implementation of the WHATWG DOM and HTML standards, primarily intended for use with Node.js. It provides a full-featured DOM environment, including support for HTML parsing and manipulation. Compared to @web/parse5-utils, jsdom offers a more comprehensive and standards-compliant environment but can be heavier and more complex to use.
htmlparser2
htmlparser2 is a fast and forgiving HTML/XML parser. It provides a SAX-style parser and a DOM handler, allowing for flexible parsing and manipulation of HTML. Compared to @web/parse5-utils, htmlparser2 is more focused on performance and low-level parsing, while @web/parse5-utils provides higher-level utilities for working with the parse5 AST.
parse5-utils
Utils for using parse5.
Usage
Examples:
import { parse } from 'parse5';
import { createElement, getTagName, appendChild, findElement } from '@web/parse5-utils';
const doc = parse(`
<html>
<body>
<my-element></my-element>
<div id="foo"></div>
</body>
</html>`);
const body = findElement(doc, e => getTagName(e) === 'body');
const script = createElement('script', { src: './foo.js' });
appendChild(body, script);
import { parse } from 'parse5';
import { getTagName, getAttribute, findElements } from '@web/parse5-utils';
const doc = parse(`
<html>
<body>
<script src="./a.js"></script>
<script type="module" src="./b.js"></script>
<script type="module" src="./c.js"></script>
</body>
</html>`);
const allModuleScripts = findElements(
doc,
e => getTagName(e) === 'script' && getAttribute(e, 'type') === 'module',
);
appendToDocument
and prependToDocument
will inject a snippet of HTML into the page, making sure it is executed last or first respectively.
It tries to avoid changing the formatting of the original file, using parse5 to find out the location of body
and head
tags and string concatenation in the original code to do the actual injection. In case of incomplete or invalid HTML it may fall back parse5 to generate a valid document and inject using AST manipulation.
import { prependToDocument, appendToDocument } from '@web/parse5-utils';
const html = '<html><body></body></html>';
const htmlWithInjectedScript = appendToDocument(
html,
'<scrip type="module" src="./injected-script.js"></script>',
);
Available functions
- createDocument
- createDocumentFragment
- createElement
- createScript
- createCommentNode
- appendChild
- insertBefore
- setTemplateContent
- getTemplateContent
- setDocumentType
- setDocumentMode
- getDocumentMode
- detachNode
- insertText
- insertTextBefore
- adoptAttributes
- getFirstChild
- getChildNodes
- getParentNode
- getAttrList
- getTagName
- getNamespaceURI
- getTextNodeContent
- getCommentNodeContent
- getDocumentTypeNodeName
- getDocumentTypeNodePublicId
- getDocumentTypeNodeSystemId
- isTextNode
- isCommentNode
- isDocumentTypeNode
- isElementNode
- setNodeSourceCodeLocation
- getNodeSourceCodeLocation
- updateNodeSourceCodeLocation
- isHtmlFragment
- hasAttribute
- getAttribute
- getAttributes
- setAttribute
- setAttributes
- setTextContent
- getTextContent
- removeAttribute
- remove
- findNode
- findNodes
- findElement
- findElements
- prependToDocument
- appendToDocument