There are currently two input formats supported, HTML and XML fragments. By default we use a very relaxed custom parser
that will try to make sense of whatever tag soup you hand it. In HTML mode, all tags and attribute names are lowercased
and selectors need to be lowercase as well.
// HTMLconst dom = newDOM('<p>Hello World!</p>');
// XMLconst dom = newDOM('<rss><link>http://example.com</link></rss>', {xml: true});
Nodes and Elements
When we parse an HTML/XML document or fragment, it gets turned into a tree of nodes.
While nodes such as #document and #fragment can be represented by DOM objects, features like dom.attr and
dom.tag will not work for them.
CSS Selectors
All CSS selectors that make sense for a standalone parser are supported.
Pattern
Represents
*
any element
E
an element of type E
E:not(s1, s2, …)
an E element that does not match either compound selector s1 or compound selector s2
E:is(s1, s2, …)
an E element that matches compound selector s1 and/or compound selector s2
E.warning
an E element belonging to the class warning
E#myid
an E element with ID equal to myid
E[foo]
an E element with a foo attribute
E[foo="bar"]
an E element whose foo attribute value is exactly equal to bar
E[foo="bar" i]
an E element whose foo attribute value is exactly equal to any (ASCII-range) case-permutation of bar
E[foo="bar" s]
an E element whose foo attribute value is exactly and case-sensitively equal to bar
E[foo~="bar"]
an E element whose foo attribute value is a list of whitespace-separated values, one of which is exactly equal to bar
E[foo^="bar"]
an E element whose foo attribute value begins exactly with the string bar
E[foo$="bar"]
an E element whose foo attribute value ends exactly with the string bar
E[foo*="bar"]
an E element whose foo attribute value contains the substring bar
E:any-link
an E element being the source anchor of a hyperlink
E:link
an E element being the source anchor of a hyperlink of which the target is not yet visited
E:visited
an E element being the source anchor of a hyperlink of which the target is already visited
E:checked
a user interface element E that is checked/selected (for instance a radio-button or checkbox)
E:root
an E element, root of the document
E:empty
an E element that has no children (neither elements nor text) except perhaps white space
E:nth-child(n [of S]?)
an E element, the n-th child of its parent matching S
E:nth-last-child(n [of S]?)
an E element, the n-th child of its parent matching S, counting from the last one
E:first-child
an E element, first child of its parent
E:last-child
an E element, last child of its parent
E:only-child
an E element, only child of its parent
E:nth-of-type(n)
an E element, the n-th sibling of its type
E:nth-last-of-type(n)
an E element, the n-th sibling of its type, counting from the last one
E:first-of-type
an E element, first sibling of its type
E:last-of-type
an E element, last sibling of its type
E:only-of-type
an E element, only sibling of its type
E:text(string)
an E element containing text content that substring matches the given string case-insensitively
E:text(/pattern/i)
an E element containing text content that regex matches the given pattern
E F
an F element descendant of an E element
E > F
an F element child of an E element
E + F
an F element immediately preceded by an E element
E ~ F
an F element preceded by an E element
All supported CSS4 selectors are considered experimental and might change as the spec evolves.
API
Everything you need to extract information from HTML/XML documents and make changes to the DOM tree.
// Parse HTMLconst dom = newDOM('<div class="greeting">Hello World!</div>');
// Render `DOM` object to HTMLconst html = dom.toString();
// Create a new `DOM` object with one HTML tagconst div = DOM.newTag('div', {class: 'greeting'}, 'Hello World!');
Navigate the DOM tree with and without CSS selectors.
// Find one element matching the CSS selector and return it as `DOM` objectconst div = dom.at('div > p');
// Find all elements marching the CSS selector and return them as `DOM` objectsconst divs = dom.find('div > p');
// Get root element as `DOM` object (document or fragment node)const root = dom.root();
// Get parent element as `DOM` objectconst parent = dom.parent();
// Get all ancestor elements as `DOM` objectsconst ancestors = dom.ancestors();
const ancestors = dom.ancestors('div > p');
// Get all child elements as `DOM` objectsconst children = dom.children();
const children = dom.children('div > p');
// Get all sibling elements before this element as `DOM` objectsconst preceding = dom.preceding();
const preceding = dom.preceding('div > p');
// Get all sibling elements after this element as `DOM` objectsconst following = dom.following();
const following = dom.following('div > p');
// Get sibling element before this element as `DOM` objectsconst previous = dom.previous();
// Get sibling element after this element as `DOM` objectsconst next = dom.next();
Extract information and manipulate elements.
// Check if element matches the given CSS selectorconst isDiv = dom.matches('div > p');
// Extract text content from elementconst greeting = dom.text();
const greeting = dom.text({recursive: true});
// Get element tagconst tag = dom.tag;
// Set element tag
dom.tag = 'div';
// Get element attribute valueconstclass = dom.attr.class;
// Set element attribute value
dom.attr.class = 'whatever';
// Remove element attributedelete dom.attr.class;
// Get element attribute namesconst names = Object.keys(dom.attr);
// Get element's rendered contentconst content = dom.content();
// Get form valueconst formValue = dom.at('input').val();
const formValue = dom.at('option').val();
const formValue = dom.at('select').val();
const formValue = dom.at('textarea').val();
const formValue = dom.at('button').val();
// Find this element's namespaceconst namespace = dom.namespace();
// Get a unique CSS selector for this elementconst selector = dom.selector();
// Remove element and its children
dom.remove();
// Remove element but preserve its children
dom.strip();
// Replace element and its children
dom.replace('<p>Hello World!</p>');
// Replace this element's content
dom.replaceContent('<p>Hello World!</p>');
// Append HTML/XML fragment after this element
dom.append('<p>Hello World!</p>');
// Append HTML/XML fragment to this element's content
dom.appendContent('<p>Hello World!</p>');
// Prepend HTML/XML fragment before this element
dom.prepend('<p>Hello World!</p>');
// Prepend HTML/XML fragment to this element's content
dom.prependContent('<p>Hello World!</p>');
// Wrap HTML/XML fragment around this element
dom.wrap('<div></div>');
// Wrap HTML/XML fragment around the content of this element
dom.wrapContent('<div></div>');
There is also a node level API that you can for example use to extend the DOM class. It is however still in flux, and
therefore not fully documented yet.
// Remove comment nodes that are children of this element
dom.currentNode.childNodes
.filter(node => node.nodeType === '#comment')
.forEach(node => node.detach());
// Extract text surrounding this elementconst text = dom.currentNode.parentNode.childNodes
.filter(node => node.nodeType === '#text')
.map(node => node.value)
.join('');
Custom Parsers
Additional input formats, such as fully spec compliant HTML5 documents can be supported with custom parsers. There is
a parse5 based
example included in this distribution that we
use for testing.
importDOM, {DocumentNode, FragmentNode} from'@mojojs/dom';
// Minimal custom HTML/XML parser that only creates the document/fragment objectsclassParser {
parse(text) {
returnnewDocumentNode();
}
parseFragment(text) {
returnnewFragmentNode();
}
}
// Parse HTML with a custom parserconst dom = newDOM('<p>Hello World!</p>', {parser: newParser()});
Installation
All you need is Node.js 16.0.0 (or newer).
$ npm install @mojojs/dom
Support
If you have any questions the documentation might not yet answer, don't hesitate to ask in the
Forum, on Matrix, or
IRC.
A fast and minimalistic HTML/XML DOM parser with CSS selectors
The npm package @mojojs/dom receives a total of 691 weekly downloads. As such, @mojojs/dom popularity was classified as not popular.
We found that @mojojs/dom demonstrated a not healthy version release cadence and project activity because the last version was released a year ago.It has 4 open source maintainers collaborating on the project.
Package last updated on 20 Oct 2023
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.