Beautiful-dom is a lightweight library that mirrors the capabilities of the HTML DOM API needed for parsing crawled HTML/XML pages. It models the methods and properties of HTML nodes that are relevant for extracting data from HTML nodes. It is written in TypeScript and can be used as a CommonJS library
What you get
- The ability to parse HTML documents as if you were dealing with HTML documents in a live browser
- Fast queries that return essential data from HTML nodes
- In-place order of HTML nodes after searching and parsing.
- Complex queries with CSS selectors.
How to use
npm install --save beautiful-dom
const BeautifulDom = require('beautiful-dom');
const document = `
<p class="paragraph highlighted-text" >
My name is <b> Ajah, C.S. </b> and I am a <span class="work"> software developer </span>
<div class = "container" id="container" >
<b> What is the name of this module </b>
<p> What is the name of this libray </p>
<a class="myWebsite" href="" > My website </a>
<label for="name"> What's your name? </label>
<input type="text" id="name" name="name" />
const dom = new BeautifulDom(document);
Methods on the document object.
- document.getElementsByTagName()
- document.getElementsByClassName()
- document.getElementsByName()
- document.getElementById()
- document.querySelectorAll()
- document.querySelector()
Methods on the HTML node object
- node.getElementsByClassName()
- node.getElementsByTagName()
- node.querySelector()
- node.querySelectorAll()
- node.getAttribute()
Properties of the HTML node object
- node.outerHTML
- node.innerHTML
- node.textContent
- node.innerText
Their usage is as they are expected to be used in an actual HTML DOM with the desired method parameters.
Examples for document object
let paragraphNodes = dom.getElementsByTagName('p');
let nodesWithSpecificClass = dom.getElementsByClassName('work');
let nodeWithSpecificId = dom.getElementById('container');
let complexQueryNodes = dom.querySelectorAll('p.paragraph b');
let nodesWithSpecificName = dom.getElementsByName('name');
let linkNode = dom.querySelector('a#myWebsite');
let linkHref = linkNode.getAttribute('href');
let linkInnerHTML = linkNode.innerHTML
let linkTextContent = linkNode.textContent
let linkInnerText = linkNode.innerText
let linkOuterHTML = linkNode.outerHTML
Examples for a node object
let paragraphNodes = dom.getElementsByTagName('p');
let nodesWithSpecificClass = paragraphNodes[0].getElementsByClassName('work');
let complexQueryNodes = paragraphNodes[0].querySelectorAll('');
let linkNode = dom.querySelector('a#myWebsite');
let linkHref = linkNode.getAttribute('href');
let linkInnerHTML = linkNode.innerHTML
let linkTextContent = linkNode.textContent
let linkInnerText = linkNode.innerText
let linkOuterHTML = linkNode.outerHTML
In case you have any ideas, features you would like to be included or any bug fixes, you can send a PR.
(Requires Node v6 or above)
git clone