
Research
/Security News
9 Malicious NuGet Packages Deliver Time-Delayed Destructive Payloads
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.
ch.digitalfondue.jfiveparse:jfiveparse
Advanced tools
jfiveparse is a 0 dependencies compact html 5 parser. It pass all the non scripted tests for the tokenizer and tree construction from the html5lib-tests suite.
jfiveparse pass all the non-scripted tests for the tokenizer and tree construction from the html5lib-tests suite.
It provides both fragment and full document parsing. It can parse directly from a String or by streaming through a Reader (note: the encoding must be known, currently the parser does not implement an autodetect feature).
Requires java 11.
As far as I know, there is no pure java html5 parser that currently pass the html5lib-tests suite (well, the more relevant tests :D, note: this project was published in october 2015).
Additionally, I wanted a library with a reduced footprint (and no dependencies). Currently, the jar weight around ~150kb. The target is to keep it under 200kb.
Performance should be competitive with other java parsers.
jfiveparse is licensed under the Apache License Version 2.0.
maven:
<dependency>
<groupId>ch.digitalfondue.jfiveparse</groupId>
<artifactId>jfiveparse</artifactId>
<version>1.1.3</version>
</dependency>
gradle:
compile 'ch.digitalfondue.jfiveparse:jfiveparse:1.1.3'
If you use it as a module, remember to add requires ch.digitalfondue.jfiveparse; in your module-info.
import ch.digitalfondue.jfiveparse.Document;
import ch.digitalfondue.jfiveparse.JFiveParse;
import ch.digitalfondue.jfiveparse.Node;
import java.io.StringReader;
import java.util.List;
public class Example {
public static void main(String[] args) {
// directly from String
Document doc = JFiveParse.parse("<html><body>Hello world!</body></html>");
System.out.println(JFiveParse.serialize(doc));
// from reader
Document doc2 = JFiveParse.parse(new StringReader("<html><body>Hello world!</body></html>"));
System.out.println(JFiveParse.serialize(doc2));
// parse fragment
List<Node> fragment = JFiveParse.parseFragment("<p><span>Hello world</span></p>");
System.out.println(JFiveParse.serialize(fragment.get(0)));
// parse fragment from reader
List<Node> fragment2 = JFiveParse.parseFragment(new StringReader("<p><span>Hello world</span></p>"));
System.out.println(JFiveParse.serialize(fragment2.get(0)));
}
}
It will print:
<html><head></head><body>Hello world!</body></html>
<html><head></head><body>Hello world!</body></html>
<p><span>Hello world</span></p>
<p><span>Hello world</span></p>
See directory: https://github.com/digitalfondue/jfiveparse/tree/master/src/test/java/ch/digitalfondue/jfiveparse/example
package ch.digitalfondue.jfiveparse.example;
import ch.digitalfondue.jfiveparse.Element;
import ch.digitalfondue.jfiveparse.JFiveParse;
import ch.digitalfondue.jfiveparse.NodeMatcher;
import ch.digitalfondue.jfiveparse.Selector;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
public class LoadHNTitle {
public static void main(String[] args) throws IOException {
try (Reader reader = new InputStreamReader(new URL("https://news.ycombinator.com/").openStream(), StandardCharsets.UTF_8)) {
// select td.title > span.titleline > a
NodeMatcher matcher = Selector.select().
element("td").hasClass("title")
.withChild().element("span").hasClass("titleline")
.withChild().element("a").toMatcher();
JFiveParse.parse(reader).getAllNodesMatching(matcher).stream()
.map(Element.class::cast)
.forEach(a -> System.out.printf("%s [%s]\n", a.getTextContent(), a.getAttribute("href")));
}
}
}
If you need to generate a org.w3c.dom.Document from the ch.digitalfondue.jfiveparse.Document representation, there is a
static method in the helper class: W3CDom.toW3CDocument.
The template element is a "normal" element, so the child nodes are not placed inside a documentFragment. This will be fixed.
The parser can be customized to allow some non-standard behaviour, you can see the following tests: https://github.com/digitalfondue/jfiveparse/blob/master/src/test/java/ch/digitalfondue/jfiveparse/OptionParseTest.java
The &ntities; are by default (and by specification) parsed and interpreted. This behavior can be disabled by:
By default, when parsing/serializing, the following transformations will be applied:
Currently, jfiveparse can preserve the entities, the attribute quoting type and the case and the tag name case.
If you require to preserve as much as possible the document when serializing back in a string, pass the following parameters:
Note: this is a deviation from the specification in term of implementation of the tokenizer, but globally, the end result is correct, as the attributes and tag names are then converted to lower case.
In the tokenizer, instead of applying the toLowerCase function on each character, the transformation is done in a single call in the TreeConstructor (see setTagName). This is used for saving the original case of the attributes and tag names.
mvn clean test jacoco:report
FAQs
Unknown package
We found that ch.digitalfondue.jfiveparse:jfiveparse demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.

Security News
Learn the essential steps every developer should take to stay secure on npm and reduce exposure to supply chain attacks.