Query Parser
A powerful and flexible query parsing library for Java that transforms search queries into structured Abstract Syntax Trees (AST).
Features
- Advanced Query Syntax: Support for boolean operators (AND, OR, NOT), phrases, wildcards, fuzzy search, field queries, and range queries
- Fluent Builder API: Modern, type-safe API for parser configuration
- AST-based: Generates a traversable Abstract Syntax Tree for advanced query manipulation
- Query Optimization: Built-in optimizers for query simplification and performance
- Query Validation: Comprehensive validation with detailed error reporting
- Extensible: Custom field parsers and node visitors for domain-specific requirements
- Zero Dependencies: No external runtime dependencies (test dependencies only)
Requirements
Installation
Add the following dependency to your pom.xml
:
<dependency>
<groupId>am.ik.query</groupId>
<artifactId>query-parser</artifactId>
<version>0.2.0-SNAPSHOT</version>
</dependency>
Quick Start
Basic Usage
import am.ik.query.Query;
import am.ik.query.parser.QueryParser;
Query query = QueryParser.create().parse("java AND (spring OR boot)");
List<String> keywords = query.extractKeywords();
Using the Parser Builder
import am.ik.query.parser.QueryParser;
import am.ik.query.parser.QueryParser.BooleanOperator;
QueryParser parser = QueryParser.builder()
.defaultOperator(BooleanOperator.AND)
.validateAfterParse(true)
.throwOnValidationError(true)
.build();
Query query = parser.parse("java spring boot");
Builder Configuration Options
defaultOperator(BooleanOperator) | Operator used between terms when no explicit operator is given | BooleanOperator.AND |
validateAfterParse(boolean) | Whether to validate the query after parsing | false |
throwOnValidationError(boolean) | Whether to throw exceptions on validation errors | false |
allowedTokenTypes(TokenType...) | Token types allowed during validation | All TokenType values |
fieldParser(String, Function) | Custom parsing logic for specific field names | No custom parsers |
lexer(QueryLexer) | Custom lexer for tokenization | QueryLexer.defaultLexer() |
Query Syntax
Boolean Operators
QueryParser.create().parse("java AND spring");
QueryParser.create().parse("java OR kotlin");
QueryParser.create().parse("java spring");
QueryParser.create().parse("(java OR kotlin) AND (spring OR boot)");
Negation: NOT vs - (Exclusion)
There are two ways to exclude terms, with important differences:
QueryParser.create().parse("java NOT android");
QueryParser.create().parse("java NOT (android OR ios)");
QueryParser.create().parse("java NOT \"mobile development\"");
QueryParser.create().parse("java -android");
QueryParser.create().parse("java -android -ios");
QueryParser.create().parse("java -\"mobile development\"");
Phrases
QueryParser.create().parse("\"hello world\"");
QueryParser.create().parse("\"Spring Boot\" AND \"Josh Long\"");
Wildcards
QueryParser.create().parse("spr?ng");
QueryParser.create().parse("spring*");
QueryParser.create().parse("*boot*");
Fuzzy Search
QueryParser.create().parse("spring~");
QueryParser.create().parse("spring~1");
Field Queries
QueryParser.create().parse("title:spring");
QueryParser.create().parse("author:\"John Doe\"");
QueryParser.create().parse("date:2024 AND status:published");
Range Queries
QueryParser.create().parse("[1 TO 10]");
QueryParser.create().parse("{1 TO 10}");
QueryParser.create().parse("[1 TO 10}");
Exclusions (Additional Examples)
QueryParser.create().parse("java -android");
QueryParser.create().parse("spring -legacy -deprecated");
QueryParser.create().parse("java NOT (android OR mobile)");
QueryParser.create().parse("spring NOT \"legacy code\"");
Advanced Features
Query Traversal
Query query = QueryParser.create().parse("java AND (spring OR boot)");
query.walk(node -> {
System.out.println(node.getClass().getSimpleName() + ": " + node.value());
});
String result = query.accept(new NodeVisitor<String>() {
@Override
public String visitAnd(AndNode node) {
return "AND(" + node.children().stream()
.map(child -> child.accept(this))
.collect(Collectors.joining(", ")) + ")";
}
@Override
public String visitToken(TokenNode node) {
return node.value();
}
});
Query Transformation
Query normalized = query.transform(QueryNormalizer.defaultNormalizer());
Query optimized = query.transform(QueryOptimizer.defaultOptimizer());
Query transformed = query
.transform(QueryNormalizer.toLowerCase())
.transform(QueryOptimizer.removeDuplicates())
.transform(QueryOptimizer.simplifyBooleans());
Query Validation
The query parser provides flexible validation options to suit different use cases. You can control when validation occurs and how errors are handled.
Validation Configuration Options
QueryParser strictParser = QueryParser.builder()
.validateAfterParse(true)
.throwOnValidationError(true)
.allowedTokenTypes(TokenType.KEYWORD, TokenType.PHRASE, TokenType.OR, TokenType.AND)
.build();
QueryParser testParser = QueryParser.builder()
.validateAfterParse(true)
.throwOnValidationError(false)
.allowedTokenTypes(TokenType.KEYWORD, TokenType.PHRASE)
.build();
QueryParser performanceParser = QueryParser.builder()
.validateAfterParse(false)
.build();
Validation Behavior Matrix
true | true | Strict mode: Validate and throw exception on errors |
true | false | Manual mode: Validate but allow error inspection |
false | false | Performance mode: No validation during parsing |
false | true | ❌ Invalid combination (ignored) |
Manual Validation
QueryParser parser = QueryParser.builder().validateAfterParse(false).build();
Query query = parser.parse("hello");
ValidationResult result = QueryValidator.validate(query);
if (!result.isValid()) {
result.errors().forEach(error -> {
System.err.println(error.message());
if (error.field() != null) {
System.err.println("Field: " + error.field());
}
if (error.invalidValue() != null) {
System.err.println("Invalid value: " + error.invalidValue());
}
});
} else {
System.out.println("Query is valid");
}
Testing Advanced Features Rejection
QueryParser legacyParser = QueryParser.builder()
.allowedTokenTypes(TokenType.KEYWORD, TokenType.PHRASE, TokenType.OR, TokenType.AND)
.validateAfterParse(true)
.throwOnValidationError(false)
.build();
Query query = legacyParser.parse("title:spring");
ValidationResult result = QueryValidator.validate(query,
Set.of(TokenType.KEYWORD, TokenType.PHRASE, TokenType.OR, TokenType.AND));
assertThat(result.isValid()).isFalse();
assertThat(result.errors().get(0).message()).contains("FIELD");
Use Cases
- Production systems: Use
validateAfterParse(true) + throwOnValidationError(true)
for immediate error detection
- Testing environments: Use
validateAfterParse(true) + throwOnValidationError(false)
to inspect validation errors
- High-performance scenarios: Use
validateAfterParse(false)
when validation overhead is not acceptable
- Legacy compatibility: Combine with
allowedTokenTypes()
to restrict parser to specific feature sets
Custom Field Parsers
QueryParser parser = QueryParser.builder()
.fieldParser("date", value -> {
LocalDate date = LocalDate.parse(value);
return new TokenNode(TokenType.KEYWORD, date.toString());
})
.fieldParser("price", value -> {
BigDecimal price = new BigDecimal(value);
return new TokenNode(TokenType.KEYWORD, price.toString());
})
.build();
Token Type Restrictions
QueryParser parser = QueryParser.builder()
.allowedTokenTypes(TokenType.KEYWORD, TokenType.PHRASE, TokenType.AND)
.validateAfterParse(true)
.throwOnValidationError(true)
.build();
Practical Example: SQL Converter
Here's a simple example of converting queries to parameterized SQL WHERE clauses for content search:
import am.ik.query.ast.*;
import am.ik.query.lexer.TokenType;
import am.ik.query.visitor.NodeVisitor;
import java.util.*;
import java.util.stream.Collectors;
public class SimpleContentSqlConverter implements NodeVisitor<String> {
public record SqlResult(String whereClause, Map<String, Object> parameters) {}
private final Map<String, Object> parameters = new HashMap<>();
private int paramCounter = 1;
public SqlResult convertToSql(Query query) {
parameters.clear();
paramCounter = 1;
if (query.isEmpty()) {
return new SqlResult("1=1", Map.of());
}
String sql = query.accept(this);
return new SqlResult(sql, parameters);
}
@Override
public String visitRoot(RootNode node) {
return node.children().stream()
.map(child -> child.accept(this))
.filter(sql -> !sql.isEmpty())
.collect(Collectors.joining(" AND "));
}
@Override
public String visitAnd(AndNode node) {
String result = node.children().stream()
.map(child -> child.accept(this))
.filter(sql -> !sql.isEmpty())
.collect(Collectors.joining(" AND "));
return node.children().size() > 1 ? "(" + result + ")" : result;
}
@Override
public String visitOr(OrNode node) {
String result = node.children().stream()
.map(child -> child.accept(this))
.filter(sql -> !sql.isEmpty())
.collect(Collectors.joining(" OR "));
return "(" + result + ")";
}
@Override
public String visitNot(NotNode node) {
if (node.child() instanceof TokenNode tokenNode && tokenNode.type() == TokenType.KEYWORD) {
return createLikeClause("content", tokenNode.value(), true);
}
String childSql = node.child().accept(this);
return childSql.isEmpty() ? "" : "NOT " + childSql;
}
@Override
public String visitToken(TokenNode node) {
return switch (node.type()) {
case KEYWORD -> createLikeClause("content", node.value(), false);
case EXCLUDE -> createLikeClause("content", node.value(), true);
default -> "";
};
}
@Override
public String visitPhrase(PhraseNode node) {
return createLikeClause("content", node.phrase(), false);
}
private String createLikeClause(String column, String value, boolean negated) {
String paramName = "param" + paramCounter++;
parameters.put(paramName, "%" + value + "%");
String operator = negated ? "NOT LIKE" : "LIKE";
return column + " " + operator + " :" + paramName;
}
@Override public String visitField(FieldNode node) { return ""; }
@Override public String visitWildcard(WildcardNode node) { return ""; }
@Override public String visitFuzzy(FuzzyNode node) { return ""; }
@Override public String visitRange(RangeNode node) { return ""; }
}
QueryParser parser = QueryParser.create();
SimpleContentSqlConverter converter = new SimpleContentSqlConverter();
Query query1 = parser.parse("java spring");
SqlResult result1 = converter.convertToSql(query1);
Query query2 = parser.parse("(java OR kotlin) -deprecated");
SqlResult result2 = converter.convertToSql(query2);
Query query3 = parser.parse("\"Spring Boot\"");
SqlResult result3 = converter.convertToSql(query3);
Query Analysis
Query query = QueryParser.create().parse("title:spring AND (java OR kotlin) -deprecated author:john*");
List<String> keywords = query.extractKeywords();
List<String> phrases = query.extractPhrases();
List<String> wildcards = query.extractWildcards();
List<String> exclusions = query.extractExclusions();
Map<String, List<String>> fields = query.extractFields();
AST Visualization with QueryPrinter
The QueryPrinter
utility provides a convenient way to visualize the Abstract Syntax Tree (AST) structure of parsed queries:
import am.ik.query.Query;
import am.ik.query.parser.QueryParser;
import am.ik.query.util.QueryPrinter;
QueryParser parser = QueryParser.create();
Query simpleQuery = parser.parse("java spring");
System.out.println(QueryPrinter.toPrettyString(simpleQuery));
Output:
Query: java spring
AST:
└─ AndNode (2 children)
└─ TokenNode[KEYWORD]: "java"
└─ TokenNode[KEYWORD]: "spring"
Complex AST Example
Query complexQuery = parser.parse("(\"Spring Boot\" OR java*) AND -deprecated AND title:framework NOT (legacy OR old)");
System.out.println(QueryPrinter.toPrettyString(complexQuery));
Output:
Query: ("Spring Boot" OR java*) AND -deprecated AND title:framework NOT (legacy OR old)
AST:
└─ AndNode (2 children)
└─ AndNode (3 children)
└─ OrNode (2 children)
└─ PhraseNode: "Spring Boot"
└─ WildcardNode: "java*"
└─ NotNode (1 children)
└─ TokenNode[KEYWORD]: "deprecated"
└─ FieldNode: title="framework"
└─ NotNode (1 children)
└─ OrNode (2 children)
└─ TokenNode[KEYWORD]: "legacy"
└─ TokenNode[KEYWORD]: "old"
Advanced Features AST
Query advancedQuery = parser.parse("spring~2 AND [1 TO 10] AND author:john");
System.out.println(QueryPrinter.toPrettyString(advancedQuery));
Output:
Query: spring~2 AND [1 TO 10] AND author:john
AST:
└─ AndNode (3 children)
└─ FuzzyNode: "spring" ~2
└─ RangeNode: [1 TO 10]
└─ FieldNode: author="john"
Node Types in AST Output
The QueryPrinter displays different node types with specific formatting:
- TokenNode[TYPE]: Basic keywords with their token type
- PhraseNode: Quoted phrases
- WildcardNode: Patterns with * or ? wildcards
- FuzzyNode: Terms with fuzzy matching (~)
- FieldNode: Field-specific queries (field:value)
- RangeNode: Range queries with [start TO end] syntax
- AndNode/OrNode: Boolean operations with child count
- NotNode: Negation operations
This visualization is particularly useful for:
- Debugging complex queries and understanding parse results
- Learning the query syntax by seeing how different inputs are structured
- Developing custom visitors by understanding the AST hierarchy
- Query optimization by identifying redundant or complex structures
Performance Considerations
- The parser is designed to handle complex queries efficiently
- Query optimization can significantly reduce the complexity of boolean expressions
- Use token type restrictions to improve parsing performance for specific use cases
- The AST structure allows for efficient query analysis and transformation
Thread Safety
QueryParser
instances are thread-safe and can be reused
Query
objects are immutable and thread-safe
- All transformation operations return new
Query
instances
- Note: Transformations may return the same instance if no changes are made
- Example: normalizer only changes case and whitespace, so "original" stays "original"
Development
Building from Source
git clone https://github.com/making/query-parser.git
cd query-parser
./mvnw clean install
Running Tests
./mvnw test
Code Formatting
The project uses Spring Java Format. Format code before committing:
./mvnw spring-javaformat:apply
v0.1 Compatibility Mode
If you need to restrict the parser to the same feature set as v0.1 (legacy version), you can configure it to only allow basic query syntax:
Configuring Legacy-Compatible Parser
QueryParser legacyCompatibleParser = QueryParser.builder()
.allowedTokenTypes(
TokenType.KEYWORD,
TokenType.PHRASE,
TokenType.EXCLUDE,
TokenType.OR,
TokenType.AND,
TokenType.NOT,
TokenType.LPAREN,
TokenType.RPAREN,
TokenType.WHITESPACE,
TokenType.EOF
)
.validateAfterParse(true)
.throwOnValidationError(true)
.build();
Supported Features in v0.1 Mode
The legacy-compatible parser supports only these features:
legacyCompatibleParser.parse("hello world");
legacyCompatibleParser.parse("java spring boot");
legacyCompatibleParser.parse("\"Spring Boot\"");
legacyCompatibleParser.parse("java AND spring");
legacyCompatibleParser.parse("java OR kotlin");
legacyCompatibleParser.parse("java NOT android");
legacyCompatibleParser.parse("spring -deprecated");
legacyCompatibleParser.parse("(java OR kotlin) AND spring");
Rejected Features in v0.1 Mode
These advanced features will throw QueryValidationException
:
legacyCompatibleParser.parse("title:hello");
legacyCompatibleParser.parse("spring*");
legacyCompatibleParser.parse("hello~2");
legacyCompatibleParser.parse("[1 TO 10]");
legacyCompatibleParser.parse("important^2");
legacyCompatibleParser.parse("+required");
Use Cases
Use v0.1 compatibility mode when:
- Migrating from the legacy query parser
- Want to restrict users to basic search syntax
- Implementing a simplified search interface where users expect space-separated terms to be AND'ed together
- Building content-only search systems like blog article search
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Author
Toshiaki Maki (@making)
License
Licensed under the Apache License, Version 2.0. See LICENSE file for details.