path-expression-matcher
Efficient path tracking and pattern matching for XML, JSON, YAML or any other parsers.
🎯 Purpose
path-expression-matcher provides two core classes for tracking and matching paths:
Expression: Parses and stores pattern expressions (e.g., "root.users.user[id]")
Matcher: Tracks current path during parsing and matches against expressions
Compatible with fast-xml-parser and similar tools.
📦 Installation
npm install path-expression-matcher
🚀 Quick Start
import { Expression, Matcher } from 'path-expression-matcher';
const expr = new Expression("root.users.user");
const matcher = new Matcher();
matcher.push("root");
matcher.push("users");
matcher.push("user", { id: "123" });
if (matcher.matches(expr)) {
console.log("Match found!");
console.log("Current path:", matcher.toString());
}
const nsExpr = new Expression("soap::Envelope.soap::Body..ns::UserId");
matcher.push("Envelope", null, "soap");
matcher.push("Body", null, "soap");
matcher.push("UserId", null, "ns");
console.log(matcher.toString());
📖 Pattern Syntax
Basic Paths
"root.users.user"
"*.users.user"
"root.*.user"
"root.users.*"
Deep Wildcard
"..user"
"root..user"
"..users..user"
Attribute Matching
"user[id]"
"user[type=admin]"
"root[lang]..user"
Position Selectors
"user:first"
"user:nth(2)"
"user:odd"
"user:even"
"root.users.user:first"
Note: Position selectors use the counter (occurrence count of the tag name), not the position (child index). For example, in <root><a/><b/><a/></root>, the second <a/> has position=2 but counter=1.
Namespaces
"ns::user"
"soap::Envelope"
"ns::user[id]"
"ns::user:first"
"*::user"
"..ns::item"
"soap::Envelope.soap::Body"
"ns::first"
Namespace syntax:
- Use double colon (::) for namespace:
ns::tag
- Use single colon (:) for position:
tag:first
- Combined:
ns::tag:first (namespace + tag + position)
Namespace matching rules:
- Pattern
ns::user matches only nodes with namespace "ns" and tag "user"
- Pattern
user (no namespace) matches nodes with tag "user" regardless of namespace
- Pattern
*::user matches tag "user" with any namespace (wildcard namespace)
- Namespaces are tracked separately for counter/position (e.g.,
ns1::item and ns2::item have independent counters)
Wildcard Differences
Single wildcard (*) - Matches exactly ONE level:
"*.fix1" matches root.fix1 (2 levels) ✅
"*.fix1" does NOT match root.another.fix1 (3 levels) ❌
- Path depth MUST equal pattern depth
Deep wildcard (..) - Matches ZERO or MORE levels:
"..fix1" matches root.fix1 ✅
"..fix1" matches root.another.fix1 ✅
"..fix1" matches a.b.c.d.fix1 ✅
- Works at any depth
Combined Patterns
"..user[id]:first"
"root..user[type=admin]"
"ns::user[id]:first"
"soap::Envelope..ns::UserId"
🔧 API Reference
Expression
Constructor
new Expression(pattern, options)
Parameters:
pattern (string): Pattern to parse
options.separator (string): Path separator (default: '.')
Example:
const expr1 = new Expression("root.users.user");
const expr2 = new Expression("root/users/user", { separator: '/' });
Methods
hasDeepWildcard() → boolean
hasAttributeCondition() → boolean
hasPositionSelector() → boolean
toString() → string
Matcher
Constructor
new Matcher(options)
Parameters:
options.separator (string): Default path separator (default: '.')
Path Tracking Methods
push(tagName, attrValues, namespace)
Add a tag to the current path. Position and counter are automatically calculated.
Parameters:
tagName (string): Tag name
attrValues (object, optional): Attribute key-value pairs (current node only)
namespace (string, optional): Namespace for the tag
Example:
matcher.push("user", { id: "123", type: "admin" });
matcher.push("item");
matcher.push("Envelope", null, "soap");
matcher.push("Body", { version: "1.1" }, "soap");
Position vs Counter:
- Position: The child index in the parent (0, 1, 2, 3...)
- Counter: How many times this tag name appeared at this level (0, 1, 2...)
Example:
<root>
<a/>
<b/>
<a/>
</root>
pop()
Remove the last tag from the path.
matcher.pop();
updateCurrent(attrValues)
Update current node's attributes (useful when attributes are parsed after push).
matcher.push("user");
matcher.updateCurrent({ id: "123" });
reset()
Clear the entire path.
matcher.reset();
Query Methods
matches(expression)
Check if current path matches an Expression.
const expr = new Expression("root.users.user");
if (matcher.matches(expr)) {
}
getCurrentTag()
Get current tag name.
const tag = matcher.getCurrentTag();
getCurrentNamespace()
Get current namespace.
const ns = matcher.getCurrentNamespace();
getAttrValue(attrName)
Get attribute value of current node.
const id = matcher.getAttrValue("id");
hasAttr(attrName)
Check if current node has an attribute.
if (matcher.hasAttr("id")) {
}
getPosition()
Get sibling position of current node (child index in parent).
const position = matcher.getPosition();
getCounter()
Get repeat counter of current node (occurrence count of this tag name).
const counter = matcher.getCounter();
getIndex() (deprecated)
Alias for getPosition(). Use getPosition() or getCounter() instead for clarity.
const index = matcher.getIndex();
getDepth()
Get current path depth.
const depth = matcher.getDepth();
toString(separator?, includeNamespace?)
Get path as string.
Parameters:
separator (string, optional): Path separator (uses default if not provided)
includeNamespace (boolean, optional): Whether to include namespaces (default: true)
const path = matcher.toString();
const path2 = matcher.toString('/');
const path3 = matcher.toString('.', false);
toArray()
Get path as array.
const arr = matcher.toArray();
State Management
snapshot()
Create a snapshot of current state.
const snapshot = matcher.snapshot();
restore(snapshot)
Restore from a snapshot.
matcher.restore(snapshot);
💡 Usage Examples
Example 1: XML Parser with stopNodes
import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';
class MyParser {
constructor() {
this.matcher = new Matcher();
this.stopNodeExpressions = [
new Expression("html.body.script"),
new Expression("html.body.style"),
new Expression("..svg"),
];
}
parseTag(tagName, attrs) {
this.matcher.push(tagName, attrs);
for (const expr of this.stopNodeExpressions) {
if (this.matcher.matches(expr)) {
return this.readRawContent();
}
}
this.parseChildren();
this.matcher.pop();
}
}
Example 2: Conditional Processing
const matcher = new Matcher();
const userExpr = new Expression("..user[type=admin]");
const firstItemExpr = new Expression("..item:first");
function processTag(tagName, value, attrs) {
matcher.push(tagName, attrs);
if (matcher.matches(userExpr)) {
value = enhanceAdminUser(value);
}
if (matcher.matches(firstItemExpr)) {
value = markAsFirst(value);
}
matcher.pop();
return value;
}
Example 3: Path-based Filtering
const patterns = [
new Expression("data.users.user"),
new Expression("data.posts.post"),
new Expression("..comment[approved=true]"),
];
function shouldInclude(matcher) {
return patterns.some(expr => matcher.matches(expr));
}
Example 4: Custom Separator
const matcher = new Matcher({ separator: '/' });
const expr = new Expression("root/config/database", { separator: '/' });
matcher.push("root");
matcher.push("config");
matcher.push("database");
console.log(matcher.toString());
console.log(matcher.matches(expr));
Example 5: Attribute Checking
const matcher = new Matcher();
matcher.push("root");
matcher.push("user", { id: "123", type: "admin", status: "active" });
console.log(matcher.hasAttr("id"));
console.log(matcher.hasAttr("email"));
console.log(matcher.getAttrValue("type"));
const expr1 = new Expression("user[id]");
console.log(matcher.matches(expr1));
const expr2 = new Expression("user[type=admin]");
console.log(matcher.matches(expr2));
Example 6: Position vs Counter
const matcher = new Matcher();
matcher.push("root");
matcher.push("item");
matcher.pop();
matcher.push("div");
matcher.pop();
matcher.push("item");
console.log(matcher.getPosition());
console.log(matcher.getCounter());
const expr = new Expression("root.item:first");
console.log(matcher.matches(expr));
Example 7: Namespace Support (XML/SOAP)
const matcher = new Matcher();
const soapExpr = new Expression("soap::Envelope.soap::Body..ns::UserId");
matcher.push("Envelope", { xmlns: "..." }, "soap");
matcher.push("Body", null, "soap");
matcher.push("GetUserRequest", null, "ns");
matcher.push("UserId", null, "ns");
if (matcher.matches(soapExpr)) {
console.log("Found UserId in SOAP body");
console.log(matcher.toString());
}
matcher.reset();
matcher.push("root");
matcher.push("item", null, "ns1");
matcher.pop();
matcher.push("item", null, "ns2");
matcher.pop();
matcher.push("item", null, "ns1");
const firstNs1Item = new Expression("root.ns1::item:first");
console.log(matcher.matches(firstNs1Item));
const secondNs1Item = new Expression("root.ns1::item:nth(1)");
console.log(matcher.matches(secondNs1Item));
matcher.reset();
matcher.push("root");
matcher.push("first", null, "ns");
const expr = new Expression("root.ns::first");
console.log(matcher.matches(expr));
🏗️ Architecture
Data Storage Strategy
Ancestor nodes: Store only tag name, position, and counter (minimal memory)
Current node: Store tag name, position, counter, and attribute values
This design minimizes memory usage:
- No attribute names stored (derived from values object when needed)
- Attribute values only for current node, not ancestors
- Attribute checking for ancestors is not supported (acceptable trade-off)
- For 1M nodes with 3 attributes each, saves ~50MB vs storing attribute names
Matching Strategy
Matching is performed bottom-to-top (from current node toward root):
- Start at current node
- Match segments from pattern end to start
- Attribute checking only works for current node (ancestors have no attribute data)
- Position selectors use counter (occurrence count), not position (child index)
Performance
- Expression parsing: One-time cost when Expression is created
- Expression analysis: Cached (hasDeepWildcard, hasAttributeCondition, hasPositionSelector)
- Path tracking: O(1) for push/pop operations
- Pattern matching: O(n*m) where n = path depth, m = pattern segments
- Memory per ancestor node: ~40-60 bytes (tag, position, counter only)
- Memory per current node: ~80-120 bytes (adds attribute values)
🎓 Design Patterns
Pre-compile Patterns (Recommended)
const expr = new Expression("..user[id]");
for (let i = 0; i < 1000; i++) {
if (matcher.matches(expr)) {
}
}
for (let i = 0; i < 1000; i++) {
if (matcher.matches(new Expression("..user[id]"))) {
}
}
Batch Pattern Checking
const patterns = [
new Expression("..user"),
new Expression("..post"),
new Expression("..comment"),
];
function matchesAny(matcher, patterns) {
return patterns.some(expr => matcher.matches(expr));
}
🔗 Integration with fast-xml-parser
Basic integration:
import { XMLParser } from 'fast-xml-parser';
import { Expression, Matcher } from 'path-expression-matcher';
const parser = new XMLParser({
stopNodes: ["script", "style"].map(tag => new Expression(`..${tag}`)),
tagValueProcessor: (tagName, value, jPath, hasAttrs, isLeaf, matcher) => {
if (matcher.matches(new Expression("..user[type=admin]"))) {
return enhanceValue(value);
}
return value;
}
});
🧪 Testing
npm test
All 77 tests covering:
- Pattern parsing (exact, wildcards, attributes, position)
- Path tracking (push, pop, update)
- Pattern matching (all combinations)
- Edge cases and error conditions
📄 License
MIT
🤝 Contributing
Issues and PRs welcome! This package is designed to be used by XML/JSON parsers like fast-xml-parser.