EssenceParser

Package Overview

Dependencies

Maintainers

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

Package was removed

Sorry, it seems this package was removed from the registry

EssenceParser

EssenceHtmlParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly typed content trees. It provides an object-oriented API for working with HTML as a tree of ContentNode objects, allowing transformation, filtering, serialization, and semantic analysis of HTML content.

1.0.0

unpublished

Source

NuGet

Maintainers: 1

Source

EssenceParser

EssenceParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly-typed content trees. It provides an object-oriented API to work with HTML as a tree of ContentNode objects, enabling transformation, filtering, serialization, and semantic analysis of HTML content.

Features

Convert HTML to a tree structure of semantic content nodes ContentNodeTree
Traverse and manipulate nodes via predicates or transformation functions
Serialize nodes back to HTML or plain text
Configure parsing behavior through ParserOptions
Clone, filter, and transform HTML content using LINQ-like operations

Overview

The main class for parsing is EssenceHtmlParser. It contains key methods for working with HTML:

ReadFromString() - Loads HTML content from a string for subsequent parsing.
ReadFromFileAsync() - Asynchronously loads HTML content from a file for parsing.
ParseAsync() - Parses the previously loaded HTML into a ContentNodeTree structure.

As a result of parsing we get ContentNodeTree, which is a hierarchical tree of content nodes, typically parsed from an HTML document. It acts as the root container for a set of top-level ContentNode instances. The ContentNodeTree class, like ContentNode, provides a number of methods for working with an object-oriented HTML tree:

MaxDepth() - Calculates the maximum depth across all root nodes in the tree.
GetNodeCount() - Computes the total number of nodes in the tree, including all descendants.
FindNodes() - Searches all nodes in the tree and retains only those that match the provided predicate. This operation modifies the tree in place by flattening it to only matching nodes.
Replace() - Applies a transformation to each root node using the specified function.
Purge() - Removes all nodes from the tree that match the specified predicate.
Clone() - Creates a deep copy of the tree, including all nodes and their attributes.
ToPlainText() - Extracts and concatenates plain text from all nodes in the tree. HTML tags and structure are omitted.
ToHtmlString() - Serializes the tree into an HTML-formatted string. This reflects the structure and content of the nodes, including indentation.

Usage

Create an instance of ContentNodeTree and specify the options in the constructor:

var parser = new EssenceHtmlParser(new ParsingOptions());

Load your HTML from a file or pass a string directly:

string html = "<html><body><p>Hello World</p></body></html>";
parser.ReadFromString(html);
// OR
await parser.ReadFromFileAsync("your-path-to-file.html");

Now you can parse your HTML in ContentNodeTree:

var tree = await parser.ParseAsync();

You can perform operations on the received ContentNodeTree:

var scriptNodes = tree.FindNodes(n => n.Tag == HtmlTag.Script);

Keywords

FAQs

What is essenceparser?

Is essenceparser well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

EssenceParser

EssenceParser

Features

Overview

Usage

Keywords

Related posts

Opengrep Adds Apex Support and New Rule Controls in Latest Updates

npm Adopts OIDC for Trusted Publishing in CI/CD Workflows