path-expression-matcher
Efficient path tracking and pattern matching for XML, JSON, YAML or any other parsers.
šÆ Purpose
path-expression-matcher provides three core classes for tracking and matching paths:
Expression: Parses and stores pattern expressions (e.g., "root.users.user[id]")
Matcher: Tracks current path during parsing and matches against expressions
MatcherView: A lightweight read-only view of a Matcher, safe to pass to callbacks
Compatible with fast-xml-parser and similar tools.
š¦ Installation
npm install path-expression-matcher
š Quick Start
import { Expression, Matcher } from 'path-expression-matcher';
const expr = new Expression("root.users.user");
const matcher = new Matcher();
matcher.push("root");
matcher.push("users");
matcher.push("user", { id: "123" });
if (matcher.matches(expr)) {
console.log("Match found!");
console.log("Current path:", matcher.toString());
}
const nsExpr = new Expression("soap::Envelope.soap::Body..ns::UserId");
matcher.push("Envelope", null, "soap");
matcher.push("Body", null, "soap");
matcher.push("UserId", null, "ns");
console.log(matcher.toString());
š Pattern Syntax
Basic Paths
"root.users.user"
"*.users.user"
"root.*.user"
"root.users.*"
Deep Wildcard
"..user"
"root..user"
"..users..user"
Attribute Matching
"user[id]"
"user[type=admin]"
"root[lang]..user"
Position Selectors
"user:first"
"user:nth(2)"
"user:odd"
"user:even"
"root.users.user:first"
Note: Position selectors use the counter (occurrence count of the tag name), not the position (child index). For example, in <root><a/><b/><a/></root>, the second <a/> has position=2 but counter=1.
Namespaces
"ns::user"
"soap::Envelope"
"ns::user[id]"
"ns::user:first"
"*::user"
"..ns::item"
"soap::Envelope.soap::Body"
"ns::first"
Namespace syntax:
- Use double colon (::) for namespace:
ns::tag
- Use single colon (:) for position:
tag:first
- Combined:
ns::tag:first (namespace + tag + position)
Namespace matching rules:
- Pattern
ns::user matches only nodes with namespace "ns" and tag "user"
- Pattern
user (no namespace) matches nodes with tag "user" regardless of namespace
- Pattern
*::user matches tag "user" with any namespace (wildcard namespace)
- Namespaces are tracked separately for counter/position (e.g.,
ns1::item and ns2::item have independent counters)
Wildcard Differences
Single wildcard (*) - Matches exactly ONE level:
"*.fix1" matches root.fix1 (2 levels) ā
"*.fix1" does NOT match root.another.fix1 (3 levels) ā
- Path depth MUST equal pattern depth
Deep wildcard (..) - Matches ZERO or MORE levels:
"..fix1" matches root.fix1 ā
"..fix1" matches root.another.fix1 ā
"..fix1" matches a.b.c.d.fix1 ā
- Works at any depth
Combined Patterns
"..user[id]:first"
"root..user[type=admin]"
"ns::user[id]:first"
"soap::Envelope..ns::UserId"
š§ API Reference
Expression
Constructor
new Expression(pattern, options = {}, data)
Parameters:
pattern (string): Pattern to parse
options.separator (string): Path separator (default: '.')
Example:
const expr1 = new Expression("root.users.user");
const expr2 = new Expression("root/users/user", { separator: '/' });
const expr3 = new Expression("root/users/user", { separator: '/' }, { extra: "data"});
console.log(expr3.data)
Methods
hasDeepWildcard() ā boolean
hasAttributeCondition() ā boolean
hasPositionSelector() ā boolean
toString() ā string
Matcher
Constructor
new Matcher(options)
Parameters:
options.separator (string): Default path separator (default: '.')
Path Tracking Methods
push(tagName, attrValues, namespace)
Add a tag to the current path. Position and counter are automatically calculated.
Parameters:
tagName (string): Tag name
attrValues (object, optional): Attribute key-value pairs (current node only)
namespace (string, optional): Namespace for the tag
Example:
matcher.push("user", { id: "123", type: "admin" });
matcher.push("item");
matcher.push("Envelope", null, "soap");
matcher.push("Body", { version: "1.1" }, "soap");
Position vs Counter:
- Position: The child index in the parent (0, 1, 2, 3...)
- Counter: How many times this tag name appeared at this level (0, 1, 2...)
Example:
<root>
<a/>
<b/>
<a/>
</root>
pop()
Remove the last tag from the path.
matcher.pop();
updateCurrent(attrValues)
Update current node's attributes (useful when attributes are parsed after push).
matcher.push("user");
matcher.updateCurrent({ id: "123" });
reset()
Clear the entire path.
matcher.reset();
Query Methods
matches(expression)
Check if current path matches an Expression.
const expr = new Expression("root.users.user");
if (matcher.matches(expr)) {
}
matchesAny(exprSet) ā boolean
Please check ExpressionSet class for more details.
const matcher = new Matcher();
const exprSet = new ExpressionSet();
exprSet.add(new Expression("root.users.user"));
exprSet.add(new Expression("root.config.*"));
exprSet.seal();
if (matcher.matchesAny(exprSet)) {
}
getCurrentTag()
Get current tag name.
const tag = matcher.getCurrentTag();
getCurrentNamespace()
Get current namespace.
const ns = matcher.getCurrentNamespace();
getAttrValue(attrName)
Get attribute value of current node.
const id = matcher.getAttrValue("id");
hasAttr(attrName)
Check if current node has an attribute.
if (matcher.hasAttr("id")) {
}
getPosition()
Get sibling position of current node (child index in parent).
const position = matcher.getPosition();
getCounter()
Get repeat counter of current node (occurrence count of this tag name).
const counter = matcher.getCounter();
getIndex() (deprecated)
Alias for getPosition(). Use getPosition() or getCounter() instead for clarity.
const index = matcher.getIndex();
getDepth()
Get current path depth.
const depth = matcher.getDepth();
toString(separator?, includeNamespace?)
Get path as string.
Parameters:
separator (string, optional): Path separator (uses default if not provided)
includeNamespace (boolean, optional): Whether to include namespaces (default: true)
const path = matcher.toString();
const path2 = matcher.toString('/');
const path3 = matcher.toString('.', false);
toArray()
Get path as array.
const arr = matcher.toArray();
State Management
snapshot()
Create a snapshot of current state.
const snapshot = matcher.snapshot();
restore(snapshot)
Restore from a snapshot.
matcher.restore(snapshot);
Read-Only Access
readOnly()
Returns a MatcherView ā a lightweight, live read-only view of the matcher. All query and inspection methods work normally and always reflect the current state of the underlying matcher. Mutation methods (push, pop, reset, updateCurrent, restore) simply don't exist on MatcherView, so misuse is caught at compile time by TypeScript rather than at runtime.
The same instance is returned on every call ā no allocation occurs per invocation. This is the recommended way to share the matcher with callbacks, plugins, or any external code that only needs to inspect the current path.
const view = matcher.readOnly();
view === matcher.readOnly();
What works on the view:
view.matches(expr)
view.getCurrentTag()
view.getCurrentNamespace()
view.getAttrValue("id")
view.hasAttr("id")
view.getPosition()
view.getCounter()
view.getDepth()
view.toString()
view.toArray()
What doesn't exist (compile-time error in TypeScript):
view.push("child", {})
view.pop()
view.reset()
view.updateCurrent({})
view.restore(snapshot)
The view is live ā it always reflects the current state of the underlying matcher.
const matcher = new Matcher();
const view = matcher.readOnly();
matcher.push("root");
view.getDepth();
matcher.push("users");
view.getDepth();
š” Usage Examples
Example 1: XML Parser with stopNodes
import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';
class MyParser {
constructor() {
this.matcher = new Matcher();
this.stopNodeExpressions = [
new Expression("html.body.script"),
new Expression("html.body.style"),
new Expression("..svg"),
];
}
parseTag(tagName, attrs) {
this.matcher.push(tagName, attrs);
for (const expr of this.stopNodeExpressions) {
if (this.matcher.matches(expr)) {
return this.readRawContent();
}
}
this.parseChildren();
this.matcher.pop();
}
}
Example 2: Conditional Processing
const matcher = new Matcher();
const userExpr = new Expression("..user[type=admin]");
const firstItemExpr = new Expression("..item:first");
function processTag(tagName, value, attrs) {
matcher.push(tagName, attrs);
if (matcher.matches(userExpr)) {
value = enhanceAdminUser(value);
}
if (matcher.matches(firstItemExpr)) {
value = markAsFirst(value);
}
matcher.pop();
return value;
}
Example 3: Path-based Filtering
const patterns = [
new Expression("data.users.user"),
new Expression("data.posts.post"),
new Expression("..comment[approved=true]"),
];
function shouldInclude(matcher) {
return patterns.some(expr => matcher.matches(expr));
}
Example 4: Custom Separator
const matcher = new Matcher({ separator: '/' });
const expr = new Expression("root/config/database", { separator: '/' });
matcher.push("root");
matcher.push("config");
matcher.push("database");
console.log(matcher.toString());
console.log(matcher.matches(expr));
Example 5: Attribute Checking
const matcher = new Matcher();
matcher.push("root");
matcher.push("user", { id: "123", type: "admin", status: "active" });
console.log(matcher.hasAttr("id"));
console.log(matcher.hasAttr("email"));
console.log(matcher.getAttrValue("type"));
const expr1 = new Expression("user[id]");
console.log(matcher.matches(expr1));
const expr2 = new Expression("user[type=admin]");
console.log(matcher.matches(expr2));
Example 6: Position vs Counter
const matcher = new Matcher();
matcher.push("root");
matcher.push("item");
matcher.pop();
matcher.push("div");
matcher.pop();
matcher.push("item");
console.log(matcher.getPosition());
console.log(matcher.getCounter());
const expr = new Expression("root.item:first");
console.log(matcher.matches(expr));
Example 8: Passing a Read-Only View to External Consumers
When passing the matcher into callbacks, plugins, or other code you don't control, use readOnly() to get a MatcherView ā it can inspect but never mutate parser state.
import { Expression, Matcher } from 'path-expression-matcher';
const matcher = new Matcher();
const adminExpr = new Expression("..user[type=admin]");
function parseTag(tagName, attrs, onTag) {
matcher.push(tagName, attrs);
onTag(matcher.readOnly());
matcher.pop();
}
function myPlugin(view) {
if (view.matches(adminExpr)) {
console.log("Admin at path:", view.toString());
console.log("Depth:", view.getDepth());
console.log("ID:", view.getAttrValue("id"));
}
}
parseTag("user", { id: "1", type: "admin" }, myPlugin);
const matcher = new Matcher();
const soapExpr = new Expression("soap::Envelope.soap::Body..ns::UserId");
matcher.push("Envelope", { xmlns: "..." }, "soap");
matcher.push("Body", null, "soap");
matcher.push("GetUserRequest", null, "ns");
matcher.push("UserId", null, "ns");
if (matcher.matches(soapExpr)) {
console.log("Found UserId in SOAP body");
console.log(matcher.toString());
}
matcher.reset();
matcher.push("root");
matcher.push("item", null, "ns1");
matcher.pop();
matcher.push("item", null, "ns2");
matcher.pop();
matcher.push("item", null, "ns1");
const firstNs1Item = new Expression("root.ns1::item:first");
console.log(matcher.matches(firstNs1Item));
const secondNs1Item = new Expression("root.ns1::item:nth(1)");
console.log(matcher.matches(secondNs1Item));
matcher.reset();
matcher.push("root");
matcher.push("first", null, "ns");
const expr = new Expression("root.ns::first");
console.log(matcher.matches(expr));
šļø Architecture
Data Storage Strategy
Ancestor nodes: Store only tag name, position, and counter (minimal memory)
Current node: Store tag name, position, counter, and attribute values
This design minimizes memory usage:
- No attribute names stored (derived from values object when needed)
- Attribute values only for current node, not ancestors
- Attribute checking for ancestors is not supported (acceptable trade-off)
- For 1M nodes with 3 attributes each, saves ~50MB vs storing attribute names
Matching Strategy
Matching is performed bottom-to-top (from current node toward root):
- Start at current node
- Match segments from pattern end to start
- Attribute checking only works for current node (ancestors have no attribute data)
- Position selectors use counter (occurrence count), not position (child index)
Performance
- Expression parsing: One-time cost when Expression is created
- Expression analysis: Cached (hasDeepWildcard, hasAttributeCondition, hasPositionSelector)
- Path tracking: O(1) for push/pop operations
- Pattern matching: O(n*m) where n = path depth, m = pattern segments
- Memory per ancestor node: ~40-60 bytes (tag, position, counter only)
- Memory per current node: ~80-120 bytes (adds attribute values)
š Design Patterns
Pre-compile Patterns (Recommended)
const expr = new Expression("..user[id]");
for (let i = 0; i < 1000; i++) {
if (matcher.matches(expr)) {
}
}
for (let i = 0; i < 1000; i++) {
if (matcher.matches(new Expression("..user[id]"))) {
}
}
Batch Pattern Checking with ExpressionSet (Recommended)
For checking multiple patterns on every tag, use ExpressionSet instead of a manual loop.
It pre-indexes expressions at build time so each call to matchesAny() does an O(1) bucket
lookup rather than a full O(N) scan:
import { Expression, ExpressionSet, Matcher } from 'path-expression-matcher';
const stopNodes = new ExpressionSet();
stopNodes
.add(new Expression('root.users.user'))
.add(new Expression('root.config.*'))
.add(new Expression('..script'))
.seal();
if (stopNodes.matchesAny(matcher)) {
}
This replaces the manual loop pattern:
function isStopNode(expressions, matcher) {
for (let i = 0; i < expressions.length; i++) {
if (matcher.matches(expressions[i])) return true;
}
return false;
}
const stopNodes = new ExpressionSet();
stopNodes.addAll(expressions);
stopNodes.matchesAny(matcher);
š¦ ExpressionSet API
ExpressionSet is an indexed collection of Expression objects designed for efficient
bulk matching. Build it once from your config, then call matchesAny() on every tag.
Constructor
const set = new ExpressionSet();
add(expression) ā this
Add a single Expression. Duplicate patterns (same pattern string) are silently ignored.
Returns this for chaining. Throws TypeError if the set is sealed.
set.add(new Expression('root.users.user'));
set.add(new Expression('..script'));
addAll(expressions) ā this
Add an array of Expression objects at once. Returns this for chaining.
set.addAll(config.stopNodes.map(p => new Expression(p)));
has(expression) ā boolean
Check whether an expression with the same pattern is already present.
set.has(new Expression('root.users.user'));
seal() ā this
Prevent further additions. Any subsequent call to add() or addAll() throws a TypeError.
Useful to guard against accidental mutation once parsing has started.
const stopNodes = new ExpressionSet();
stopNodes.addAll(patterns).seal();
stopNodes.add(new Expression('root.extra'));
size ā number
Number of distinct expressions in the set.
set.size;
isSealed ā boolean
Whether seal() has been called.
matchesAny(matcher) ā boolean
Returns true if the matcher's current path matches any expression in the set.
Accepts both a Matcher instance and a MatcherView.
if (stopNodes.matchesAny(matcher)) { }
if (stopNodes.matchesAny(matcher.readOnly())) { }
How indexing works: expressions are bucketed at add() time, not at match time.
Fixed path, concrete tag (root.users.user) | depth:tag map | O(1) |
Fixed path, wildcard tag (root.config.*) | depth map | O(1) |
Deep wildcard (..script) | flat list | O(D) ā always scanned |
In practice, deep-wildcard expressions are rare in configs, so the list stays small.
findMatch(matcher) ā Expression
Returns the Expression instance that matched the current path. Accepts both a Matcher instance and a MatcherView.
const node = stopNodes.findMatch(matcher);
Example 7: ExpressionSet in a real parser loop
import { XMLParser } from 'fast-xml-parser';
import { Expression, ExpressionSet, Matcher } from 'path-expression-matcher';
const stopNodes = new ExpressionSet();
stopNodes
.addAll(['script', 'style'].map(t => new Expression(`..${t}`)))
.seal();
const matcher = new Matcher();
const parser = new XMLParser({
onOpenTag(tagName, attrs) {
matcher.push(tagName, attrs);
if (stopNodes.matchesAny(matcher)) {
}
},
onCloseTag() {
matcher.pop();
},
});
š Integration with fast-xml-parser
Basic integration:
import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';
const parser = new XMLParser({
stopNodes: ["script", "style"].map(tag => new Expression(`..${tag}`)),
tagValueProcessor: (tagName, value, jPath, hasAttrs, isLeaf, matcher) => {
if (matcher.matches(new Expression("..user[type=admin]"))) {
return enhanceValue(value);
}
return value;
}
});
š License
MIT
š¤ Contributing
Issues and PRs welcome! This package is designed to be used by XML/JSON parsers like fast-xml-parser. But can be used with any formar parser.