
Security News
Static vs. Runtime Reachability: Insights from Latio’s On the Record Podcast
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.
EssenceHtmlParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly typed content trees. It provides an object-oriented API for working with HTML as a tree of ContentNode objects, allowing transformation, filtering, serialization, and semantic analysis of HTML content.
EssenceParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly-typed content trees.
It provides an object-oriented API to work with HTML as a tree of ContentNode
objects, enabling transformation, filtering, serialization, and semantic analysis of HTML content.
ContentNodeTree
ParserOptions
The main class for parsing is EssenceHtmlParser
. It contains key methods for working with HTML:
ReadFromString()
- Loads HTML content from a string for subsequent parsing.ReadFromFileAsync()
- Asynchronously loads HTML content from a file for parsing.ParseAsync()
- Parses the previously loaded HTML into a ContentNodeTree
structure.As a result of parsing we get ContentNodeTree
, which is a hierarchical tree of content nodes, typically parsed from an HTML document. It acts as the root container for a set of top-level ContentNode
instances.
The ContentNodeTree
class, like ContentNode
, provides a number of methods for working with an object-oriented HTML tree:
MaxDepth()
- Calculates the maximum depth across all root nodes in the tree.GetNodeCount()
- Computes the total number of nodes in the tree, including all descendants.FindNodes()
- Searches all nodes in the tree and retains only those that match the provided predicate. This operation modifies the tree in place by flattening it to only matching nodes.Replace()
- Applies a transformation to each root node using the specified function.Purge()
- Removes all nodes from the tree that match the specified predicate.Clone()
- Creates a deep copy of the tree, including all nodes and their attributes.ToPlainText()
- Extracts and concatenates plain text from all nodes in the tree. HTML tags and structure are omitted.ToHtmlString()
- Serializes the tree into an HTML-formatted string. This reflects the structure and content of the nodes, including indentation.Create an instance of ContentNodeTree
and specify the options in the constructor:
var parser = new EssenceHtmlParser(new ParsingOptions());
Load your HTML from a file or pass a string directly:
string html = "<html><body><p>Hello World</p></body></html>";
parser.ReadFromString(html);
// OR
await parser.ReadFromFileAsync("your-path-to-file.html");
Now you can parse your HTML in ContentNodeTree
:
var tree = await parser.ParseAsync();
You can perform operations on the received ContentNodeTree
:
var scriptNodes = tree.FindNodes(n => n.Tag == HtmlTag.Script);
FAQs
EssenceHtmlParser is a small C# library for parsing, traversing, and transforming HTML documents into structured, strongly typed content trees. It provides an object-oriented API for working with HTML as a tree of ContentNode objects, allowing transformation, filtering, serialization, and semantic analysis of HTML content.
We found that essenceparser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.
Security News
The latest Opengrep releases add Apex scanning, precision rule tuning, and performance gains for open source static code analysis.
Security News
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.