Readable Regex
With this library, you can create regular expressions in a readable way!
Table of contents
- About the library
- User guide
- Contributing
- Local development
About the library
This library uses the builder pattern to create regular expressions. Using methods with understandable names to create
your expression, should be more readable and therefore easier to maintain!
Regular expression engine
This library uses the engine implemented in the JDK. All the details and specifics of the engine can be found in the
JavaDoc of the class Pattern.
Replacement of JavaVerbalExpressions
JavaVerbalExpressions is another library created for Java
to construct regular expressions using a Builder pattern. I liked this library, but there were a few caveats:
- It seems that it is not maintained anymore.
- It misses some functionality (for example, lookahead).
- It is not written with Java in mind (the idea is ported to all languages).
This library is created to be a better version of JavaVerbalExpressions.
Readability over performance
This library is focussed fully on readability and correctness. Very often performance of regular expressions is not
important. However, in some cases (especially with large input or with catastrophic backtracking) it can be very
troublesome. There is a lot of information online on how to make your regular expressions as fast as possible. However,
changing the builder to get good performing regular expressions may not be readable. If you are reliant on good
performing expressions, this library may not be the best choice.
User guide
Note: Hamcrest is used for all the examples to show the expected outcome. If you want the examples
to compile in your own project, you should include this library.
Examples
Let's try a basic URL matching pattern:
ReadableRegexPattern pattern =
regex() // Always start with the regex method to start the builder.
.literal("http") // Literals are escaped automatically, no need to do this yourself.
.literal("s").optional() // You can follow up with optional to make the "s" optional.
.literal("://")
.anyCharacterExcept(" ").zeroOrMore() // This comes down to [^ ]*.
.build(); // Create the pattern with the final method.
// The matchesText will return a boolean whether we have an *exact* match or not!
assertThat(pattern.matchesTextExactly("https://www.github.com"), equalTo(true));
// toString() method will return the underlying pattern. Not really readable though, that is why we have this library!
assertThat(pattern.toString(), equalTo("(?:\\Qhttp\\E)(?:\\Qs\\E)?(?:\\Q://\\E)[^ ]*"));
With the library, you can create the JDK Matcher object when matching a text. Using this object, you can do the usual
things like getting the value of groups.
ReadableRegexPattern pattern = regex()
.startGroup() // You can use this method to start capturing the expression inside a group.
.word()
.endGroup() // This ends the last group.
.whitespace()
.startGroup("secondWord") // You can also give names to your group.
.word()
.endGroup()
.build();
Matcher matcher = pattern.matches("abc def");
assertThat(matcher.matches(), equalTo(true));
// Groups can always be found based on the order they are used.
assertThat(matcher.group(1), equalTo("abc"));
assertThat(matcher.group(2), equalTo("def"));
// If you have given the group a name, you can also find it based on the name.
assertThat(matcher.group("secondWord"), equalTo("def"));
The useful thing about this library is that you can include pattern inside other patterns!
// It does not matter if you have already built the pattern, you can include it anyway.
ReadableRegex<?> digits = regex().startGroup().digit().oneOrMore().endGroup().whitespace();
ReadableRegexPattern word = regex().startGroup().word().endGroup().whitespace().build();
ReadableRegexPattern pattern = regex()
.add(digits)
.add(digits)
.add(word)
.add(digits)
.literal("END")
.build();
Matcher matcher = pattern.matches("12\t11\thello\t0000\tEND");
assertThat(matcher.matches(), equalTo(true));
// Note that captures are always a String!
assertThat(matcher.group(1), equalTo("12"));
assertThat(matcher.group(2), equalTo("11"));
assertThat(matcher.group(3), equalTo("hello"));
assertThat(matcher.group(4), equalTo("0000"));
Some random stuff you can do:
ReadableRegexPattern pattern = regex()
.oneOf(regex().literal("abc"), regex().digit()) // The oneOf method represents "or".
.whitespace()
// If we want to add a quantifier over a larger expression, we can encapsulate it with the add method,
// which encloses the expression in an unnamed group.
.add(regex().literal("a").digit()).exactlyNTimes(3)
.whitespace()
// Alternatively, you can use the startUnnamedGroup() for this to avoid nested structures.
.startUnnamedGroup().literal("b").digit().endGroup().atMostNTimes(2)
.build();
assertThat(pattern.matchesTextExactly("abc a1a2a3 b2"), equalTo(true));
assertThat(pattern.matchesTextExactly("1 a3a6a9 "), equalTo(true));
Quantifiers
All the quantifiers are greedy by default. If you want to make them reluctant or possessive, you can use the methods
reluctant()
and possessive()
after the quantifier.
If you want to know the differences between these types of quantifiers, read about it in
this post.
ReadableRegexPattern greedyPattern = regex().anything().literal("foo").build();
ReadableRegexPattern reluctantPattern = regex().anything().reluctant().literal("foo").build();
ReadableRegexPattern possessivePattern = regex().anything().possessive().literal("foo").build();
String text = "xfooxxxxxxfoo";
assertThat(greedyPattern.matchesText(text), equalTo(true));
Matcher matcher = reluctantPattern.matches(text);
assertThat(matcher.find(), equalTo(true));
assertThat(matcher.group(), equalTo("xfoo"));
assertThat(matcher.find(), equalTo(true));
assertThat(matcher.group(), equalTo("xxxxxxfoo"));
matcher = possessivePattern.matches(text);
assertThat(matcher.find(), equalTo(false));
Working around the limits of the library
Not everything will be supported by the library. Sometimes you may want something very specific. There are a few methods
to help you with that.
The following example creates the regular expression [a-z&&[^p]]
, which matches all lower-case letters except p
.
ReadableRegexPattern pattern1 = regex()
.regexFromString("[a-z&&[^p]]") // With this method, you can add any kind of expression.
.build();
// Or you could use the overloaded variant of the regex method, which is the same:
ReadableRegexPattern pattern2 = regex("[a-z&&[^p]]").build();
assertThat(pattern1.matchesTextExactly("p"), equalTo(false));
assertThat(pattern1.matchesTextExactly("c"), equalTo(true));
assertThat(pattern2.matchesTextExactly("p"), equalTo(false));
assertThat(pattern2.matchesTextExactly("c"), equalTo(true));
Note that the ReadableRegexPattern class is basically a wrapper of a JDK Pattern object. So if you need specific methods,
you can use the underlying object:
ReadableRegexPattern pattern = regex().literal(".").build(); // Don't forget that literal escapes any meta character like dot!
Pattern jdkPattern = pattern.getUnderlyingPattern();
assertThat(jdkPattern.split("a.b.c"), equalTo(new String[]{"a", "b", "c"}));
Extending the builder
You can fully customize the builder to your own needs! It is possible to add new methods and override existing methods.
The code below is an example on how to create your own extension:
// You have to extend from ExtendableReadableRegex, where you fill in your own class as generic type.
public class TestExtension extends ExtendableReadableRegex<TestExtension> {
// It is highly advised to create your own static method "regex()". This way you can easily instantiate
// your class and in your existing code you only have to change your import statement.
public static TestExtension regex() {
return new TestExtension();
}
// In your own extension you can add any method you like.
public TestExtension digitWhitespaceDigit() {
// For the implementation of your extension, you can only use the publicly available methods. All variables
// and other methods are made private in the instance.
// If you want to add arbitrary expressions, you can always use the method "regexFromString(...)".
return digit().whitespace().digit();
}
// You can also override existing methods! To make sure that the code doesn't break, please always end
// with calling the super method.
@Override
public ReadableRegexPattern buildWithFlags(PatternFlag... patternFlags) {
return super.buildWithFlags(PatternFlag.DOT_ALL);
}
}
This can now be used:
ReadableRegexPattern pattern = TestExtension.regex().digitWhitespaceDigit().build();
assertThat(pattern.matchesTextExactly("1 3"), equalTo(true));
assertThat(pattern.enabledFlags(), contains(PatternFlag.DOT_ALL));
Javadoc
If you are looking for in-depth information about all the available methods, take a look at the Javadoc.
You can find the latest version here.
Contributing
If you have any suggestions, submit an issue right here in the GitHub project! Any bugs, features or random thoughts
are appreciated :)
Local development
Checks
All additional plugins to check the code base should run when calling the following gradle command:
gradle checks
This will trigger SpotBugs, Checkstyle, JaCoCo and Pitest.
If the checks succeed, the test coverage is printed. Of course, the test coverage should be 100%.
If you want to run one of the components individually (because this could be faster), you can use:
gradle spotbugs
gradle checkstyle
gradle jacoco
gradle pitest
gradle printTestPercentages
The reports are available in HTML form and are located in build/reports
.
Publishing new releases
Every release should correspond to a tag in git. This tag should be manually added.
Uploading new releases to Maven Central can be done using the following command:
gradle publish -PcustomVersion=X
If no version is supplied, the default head-SNAPSHOT
is used.
Note that at this time, only the creator of this library (Rico Apon) can upload new releases.