HD HTML Parser
A fast and simple HTML parser that returns a Document object with various methods to manipulate and query the HTML elements.
Features
Installation
To install the hd-html-parser package, run the following command:
npm i hd-html-parser
Usage
To use the hd-html-parser package, you need to import it and call the HtmlParser function with a HTML string as an argument. It will return a Promise that resolves to a Document object or null if the HTML string is invalid. For example:
import HDHtmlParser from "hd-html-parser";
const html = `
<html>
<head>
<title>Example</title>
</head>
<body>
<h1>Hello, world!</h1>
<p>This is a paragraph.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
</body>
</html>
`;
HDHtmlParser(html).then((document) => {
console.log(document.getHtml());
console.log(document.querySelector("h1").getText());
console.log(document.querySelectorAll("li").length);
});
Document Methods
The Document object returned by the hd-html-parser package has the following methods:
querySelector(selector: string): Document | null
- Returns the first element that matches the given CSS selector, or null if none is found.querySelectorAll(selector: string): (Document | null)[]
- Returns an array of all elements that match the given CSS selector, or an empty array if none is found.getHtml(): string | null
- Returns the HTML string of the element, or null if the element is not valid.getAttribute(name: string): string | null | undefined
- Returns the value of the attribute with the given name, or null if the attribute does not exist, or undefined if the element is not valid.getText(): string | null
- Returns the text content of the element, or null if the element is not valid.getParent(): Document
- Returns the parent element of the element, or the element itself if it has no parent.getChildren(): Array<Document>
- Returns an array of the child elements of the element, or an empty array if the element has no children.getOuterHTML(): string | null
- Returns the HTML string of the element including its opening and closing tags, or null if the element is not valid.getInnerHTML(): string | null
- Returns the HTML string of the element excluding its opening and closing tags, or null if the element is not valid.getNext(): Document | null
- Returns the next sibling element of the element, or null if the element has no next sibling.getPrev(): Document | null
- Returns the previous sibling element of the element, or null if the element has no previous sibling.getListData(selector: string, itemsSelector: any, meta?: any): Array<object>
- Returns an array of objects that represent the data of the list elements that match the given selector. The itemsSelector parameter is an object that maps the keys of the data objects to the CSS selectors of the list items. The optional meta parameter is an object that maps the keys of the data objects to the values of the meta attributes of the list elements. For example:
License
This project is licensed under the terms of the MIT license. See the LICENSE file for details.