You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

Book a Demo Install Sign in

maven

Categories
Server
File Formats
HTML Parser

HTML Parser

net.sourceforge.nekohtml:nekohtml

An HTML parser and tag balancer.

1.9.22 • 10 years ago

net.sourceforge.htmlcleaner:htmlcleaner

HtmlCleaner is an HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that most web-browsers use.

2.6.1 • 12 years ago

org.htmlparser:htmlparser

HTML Parser is the high level syntactical analyzer.

2.1 • 14 years ago

net.sf.jtidy:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

r938 • 15 years ago

net.htmlparser.jericho:jericho-html

Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.

3.4 • 10 years ago

jtidy:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.

4aug2000r7-dev • 18 years ago

nu.validator.htmlparser:htmlparser

The Validator.nu HTML Parser is an implementation of the HTML5 parsing algorithm in Java for applications. The parser is designed to work as a drop-in replacement for the XML parser in applications that already support XHTML 1.x content with an XML parser and use SAX, DOM or XOM to interface with the parser.

1.4 • 13 years ago

org.ccil.cowan.tagsoup:tagsoup

TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

1.2.1 • 14 years ago

org.hibernate:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML.

r8-20060801 • 18 years ago

org.apache.tika:tika-parser-html-module

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

3.2.1 • last month

com.vladsch.flexmark:flexmark-html-parser

flexmark-java extension to convert HTML to Markdown

0.50.50 • 6 years ago

org.apache.tika:tika-parser-html-commons

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

2.9.4 • 3 months ago

com.itextpdf:styled-xml-parser

Styled XML parser is used by iText modules to parse HTML and XML

9.2.0 • 2 months ago

com.mikesamuel:json-sanitizer

Given JSON-like content, converts it to valid JSON. This can be attached at either end of a data-pipeline to help satisfy Postel's principle: be conservative in what you do, be liberal in what you accept from others Applied to JSON-like content from others, it will produce well-formed JSON that should satisfy any parser you use. Applied to your output before you send, it will coerce minor mistakes in encoding and make it easier to embed your JSON in HTML and XML.

1.2.3 • 4 years ago

net.sf.cssbox:pdf2dom

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. The inline CSS definitions contained in the resulting document are used for making the HTML page as similar as possible to the PDF input. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM interface for your DOM-based applications or as an alternative parser for the CSSBox rendering engine in order to add the PDF processing capability to CSSBox.

2.0.3 • 3 years ago

net.htmlparser:jericho-html

Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

1.5-dev1 • 19 years ago

com.liferay:com.liferay.portal.html.parser.impl

Liferay Portal Html Parser Implementation

1.0.5 • 2 years ago

org.attoparser:attoparser

Powerful, fast and easy to use HTML and XML parser for Java

2.0.7.RELEASE • 2 years ago

org.xwiki.rendering:xwiki-rendering-syntax-html

Parser for the HTML 4.01 syntax

17.5.0 • 4 weeks ago

net.liftweb:lift-textile

Textile (a wiki syntax) parser textile to html

1.0.3 • 15 years ago

it.skrape:skrapeit-html-parser

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. First and foremost it aims to be a testing lib, but it can also be used to scrape websites in a convenient fashion.

1.2.2 • 3 years ago

pl.droidsonroids:jspoon

Annotation based HTML to Java parser

1.3.3 • 2 years ago

it.unimi.di.law:jericho-html-dev

The development version of the Jericho HTML parser.

20131217 • 12 years ago

ru.noties:markwon-html-parser-api

Markwon

2.0.2 • 6 years ago

org.jvnet.hudson:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM parser for real-world HTML. Hudson modifications: ===================== Removed SAX APIs

4aug2000r7-dev-hudson-1 • 17 years ago

com.github.jtidy:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

1.0.5 • 2 years ago

com.google.code.maven-play-plugin.org.allcolor.shanidom:shani-parser

ShaniXmlParser is an XML/HTML DOM/SAX parser. It can parse not well formed XML/HTML files. It can parse files with inverted tags and bad escaped &,<,> and ". It expands all XHTML entities by default. It is well suited to parse HTML files, and is fast with low memory usage. It is compliant with the jaxp/w3c DOM1/2/3 interfaces.

1.4.17 • 12 years ago

com.vaadin.external.jsoup:jsoup-case-sensitive

jsoup HTML parser

1.9.2 • 9 years ago

cat.inspiracio:html-parser

HTML-parser provides a parser for HTML 5 that produces HTML 5 document object model. It aims to be a Java-implementation of http://www.w3.org/TR/html5/. It is for use in the server. It does not implement features that are relevant in the client, like event handling. It is for use from javascript, via Java's scripting library.

0.0.6 • last year

ru.noties:markwon-html-parser-impl

Markwon

2.0.2 • 6 years ago

com.google.code.maven-play-plugin.net.sf.jtidy:jtidy

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

r938 • 12 years ago

org.codelibs:nekohtml

An HTML parser and tag balancer.

2.1.3 • last year

net.tislib.websiteparser:annotations

Website Html parser based on Java beans and annotations

0.1.2 • 6 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

net.sf.jmatchparser:jMatchParser-util

A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets. The jMatchParser-util module contains the utility classes for parsing.

0.1 • 14 years ago

net.sf.jmatchparser:jMatchParser-juniversalchardet

A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets. The jMatchParser-juniversalchardet module contains a charset provider for a character set that uses juniversalchardet for automatically detecting the charset.

0.1 • 14 years ago

net.sf.jmatchparser:jMatchParser-charset

A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets. The jMatchParser-charset module contains the character sets.

0.1 • 14 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.textile

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.markdown

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.mediawiki

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.confluence

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

org.webjars.npm:html-react-parser

WebJar for html-react-parser

5.2.5 • 2 weeks ago

com.github.irshulx:android-wysiwyg-editor

Android WYSIWYG is a text editor written in Android using the native components in the content tree. The library can be used as both Editor and Renderer. The HTML parser helps it easier to integrate with specific web WYSIWY

0.3.4 • 9 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.twiki

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.tracwiki

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

net.sf.jmatchparser:jMatchParser-jchardet

A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets. The jMatchParser-jchardet module contains a charset provider for a character set that uses jchardet for automatically detecting the charset.

0.1 • 14 years ago

org.apache.sling:org.apache.sling.extensions.apt.servlet

Servlet for *.apt requests, uses the apt-parser to convert *.apt files to HTML.

2.0.2-incubator • 16 years ago

net.liftweb:lift-textile_2.7.7

Textile (a wiki syntax) parser textile to html

2.2 • 15 years ago

org.eclipse.mylyn.docs:org.eclipse.mylyn.wikitext.html

Mylyn WikiText provides an extensible framework and tools for parsing, editing and presenting lightweight markup. WikiText has parsers for AsciiDoc, CommonMark, Markdown, MediaWiki, Textile, Confluence, Creole, HTML, TracWiki and TWiki markup, and can be extended to support other languages. WikiText provides Ant tasks for converting lightweight markup to HTML, Eclipse Help, DocBook, DITA and XSL-FO. WikiText also provides an editor for editing such markup within Eclipse, and integrates with the Mylyn task editor causing it to be markup-aware. WikiText provides API for integrating wiki markup capabilities into Eclipse, RCP, stand-alone and server-side applications.

3.0.43 • 3 years ago

com.github.fancyerii:nekohtmlparser

neko html parser wrapper

1.0 • 9 years ago

Product

Package Alerts
Integrations
Docs
Pricing
FAQ
Roadmap
Changelog

About

About
Love
Blog
Glossary
CareersHiring
Send Feedback
Contact Us
System Status

Packages

Explore Rubygems

Stay in touch

Get open source security insights delivered straight into your inbox.

Enter your email

Terms
Privacy
Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.