You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

SimpleHTMLParser

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

SimpleHTMLParser

SimpleHTMLParser is a simple to use, efficient, and full-featured HTML Document Parser. You can parse an HTML text and retrieve any element(s) in it.

1.0.1
Source
nugetNuGet
Version published
Maintainers
1
Created
Source

SimpleHTMLParser

A simple and full-feature HTML Parser in C#

What is SimpleHTMLParser?

SimpleHTMLParser is a simple to use, efficient, and full-featured HTML Document Parser. You can parse an HTML text and retrieve any element(s) in it.

Where can I get it?

First, install NuGet. Then, install SimpleHTMLParser from the package manager console:

PM> Install-Package SimpleHTMLParser

Or from the .NET CLI as:

dotnet add package SimpleHTMLParser

var htmlText =
            @"<!DOCTYPE html>
<html lang=""en"">
<head>
    <meta charset=""UTF-8"">
    <meta name=""viewport"" content=""width=device-width, initial-scale=1.0"">
    <title>Document</title>
</head>
<body>

    <p id=""paragraph1"">Lorem ipsum, dolor sit amet consectetur adipisicing elit. Odit veritatis, assumenda quibusdam et deserunt architecto nulla eligendi quod recusandae vitae doloremque dicta quam? Asperiores, aut? Autem doloribus voluptatum itaque maiores?</p>
    <p id=""paragraph2"">Lorem ipsum, dolor sit amet consectetur adipisicing elit. Odit veritatis, assumenda quibusdam et deseru?</p>
    <p id=""paragraph3"">Lorem ipsum, dolor sit amet consectetur adipisicing elit. Odit veritatis, assumenda quibusdam et deserunt aaque maiores?</p>
    <p id=""paragraph4"">Lorem ipsum, dolor sit amet consectetur adipisicing elit. Odit veritatis, assumenda quibusdam et deserunt architecto nulla eligendi quod recusandae vitae doloremque dicta quam? Asperiores, aut? Autem doloribus voluptatum itaque maiores?</p>
    <p>My name is Faith</p>
    <h2 class=""Header2"">Welcome to HTMLParser C# by propenster</h2>
    <p>
        This is another paragraph... 
        Loaded 'C:\MinGW\bin\libgcc_s_dw2-1.dll'. Symbols loaded.
        Loaded 'C:\MinGW\bin\libstdc++-6.dll'. Symbols loaded.

    </p>

    <div id=""myDiv"">
       <p>Paragraph under div</p>
        <p>Another paragraph under div </p>
        <span class=""mySpan"">This is a span under this DIV</span>
    </div>

</body>
</html>";

        var htmlParser = new HTMLParser();
        var htmlDocument = htmlParser.Parse(htmlText);

        //get a P Tag with a particular Attribute...
        IHtmlElement paragraph1 = htmlDocument.FindElement(By.Id("paragraph1"));

        Console.WriteLine("Paragraph1 Text >>> {0}", paragraph1?.Text ?? string.Empty);

        IHtmlElement headerElement = htmlDocument.FindElement(By.ClassName("Header2"));
        Console.WriteLine("Header2 Text >>> {0}", headerElement?.Text);

        //get div...
        IHtmlElement myDiv = htmlDocument.FindElement(By.Id("myDiv"));
        // get other elements under this DIV
        var paragraphsUnderMyDiv = myDiv.FindElements(By.ElementTag("p"));
        var spanUnderDivWithClassMySpan = myDiv.FindElements(By.ClassName("mySpan"));
        Console.WriteLine("This is the text from our Span under the DIV >>> {0}", spanUnderDivWithClassMySpan?.Text);
        Console.WriteLine("There are {0} paragraph tags under DIV myDiv", paragraphsUnderMyDiv.Count());
        




Check the examples folder for more examples.

Keywords

html

FAQs

Package last updated on 22 Aug 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts