
Research
2025 Report: Destructive Malware in Open Source Packages
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.
regexp-tree
Advanced tools
Regular expressions processor in JavaScript
TL;DR: RegExp Tree is a regular expressions processor, which includes parser, traversal, transformer, and optimizer APIs.
You can get an overview of the tool in this article.
The parser can be installed as an npm module:
npm install -g regexp-tree
regexp-tree --help
You can also try it online using AST Explorer.
npm test still passes (add new tests if needed)The regexp-tree parser is implemented as an automatic LR parser using Syntax tool. The parser module is generated from the regexp grammar, which is based on the regular expressions grammar used in ECMAScript.
For development from the github repository, run build command to generate the parser module:
git clone https://github.com/<your-github-account>/regexp-tree.git
cd regexp-tree
npm install
npm run build
./bin/regexp-tree --help
NOTE: You need to run
buildcommand every time you change the grammar file.
Check the options available from CLI:
regexp-tree --help
Usage: regexp-tree [options]
Options:
-e, --expression A regular expression to be parsed
-l, --loc Whether to capture AST node locations
To parse a regular expression, pass -e option:
regexp-tree -e '/a|b/i'
Which produces an AST node corresponding to this regular expression:
{
type: 'RegExp',
body: {
type: 'Disjunction',
left: {
type: 'Char',
value: 'a',
kind: 'simple'
},
right: {
type: 'Char',
value: 'b',
kind: 'simple'
}
},
flags: 'i',
}
NOTE: the format of a regexp is
/ Body / OptionalFlags.
The parser can also be used as a Node module:
const regexpTree = require('regexp-tree');
console.log(regexpTree.parse(/a|b/i)); // RegExp AST
Note, regexp-tree supports parsing regexes from strings, and also from actual RegExp objects (in general -- from any object which can be coerced to a string). If some feature is not implemented yet in an actual JavaScript RegExp, it should be passed as a string:
// Pass an actual JS RegExp object.
regexpTree.parse(/a|b/i);
// Pass a string, since `s` flag may not be supported in older versions.
regexpTree.parse('/./s');
Also note, that in string-mode, escaping is done using two slashes \\ per JavaScript:
// As an actual regexp.
regexpTree.parse(/\n/);
// As a string.
regexpTree.parse('/\\n/');
For source code transformation tools it might be useful also to capture locations of the AST nodes. From the command line it's controlled via the -l option:
regexp-tree -e '/ab/' -l
This attaches loc object to each AST node:
{
type: 'RegExp',
body: {
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple',
loc: {
start: 1,
end: 2
}
},
{
type: 'Char',
value: 'b',
kind: 'simple',
loc: {
start: 2,
end: 3
}
}
],
loc: {
start: 1,
end: 3
}
},
flags: '',
loc: {
start: 0,
end: 4
}
}
From Node it's controlled via setOptions method exposed on the parser:
const regexpTree = require('regexp-tree');
const parsed = regexpTree
.parser
.setOptions({captureLocations: true})
.parse(/a|b/);
The setOptions method sets global options, which are preserved between calls. It is also possible to provide options per a single parse call, which might be more preferred:
const regexpTree = require('regexp-tree');
const parsed = regexpTree.parse(/a|b/, {
captureLocations: true,
});
The traverse module allows handling needed AST nodes using visitor pattern. In Node the module is exposed as regexpTree.traverse method. Handlers receive an instance of NodePath class, which encapsulates node itself, its parent node, property, and index (in case if a node is a part of a collection).
Example:
const regexpTree = require('regexp-tree');
// Get AST.
const ast = regexpTree.parse('/[a-z]{1,}/');
// Handle nodes.
regexpTree.traverse(ast, {
// Handle "Quantifier" node type.
Quantifier({node}) {
...
},
});
// Generate the regexp.
const re = regexpTree.generate(ast);
console.log(re); // '/[a-z]+/'
While traverse module provides basic traversal API, which can be used for any purposes of AST handling, transform module focuses mainly on transformation of regular expressions.
It accepts a regular expressions in different formats (string, an actual RegExp object, or an AST), applies a set of transformations, and retuns an instance of TransformResult. Handles receive as a parameter the same NodePath object used in traverse.
Example:
const regexpTree = require('regexp-tree');
// Handle nodes.
const re = regexpTree.transform('/[a-z]{1,}/i', {
/**
* Handle "Quantifier" node type,
* transforming `{1,}` quantifier to `+`.
*/
Quantifier(path) {
const {node} = path;
// {1,} -> +
if (
node.type === 'Range' &&
node.from === 1 &&
!node.to
) {
path.replace({
type: 'Quantifier',
kind: '+',
greedy: node.greedy,
});
}
},
});
console.log(re.toString()); // '/[a-z]+/i'
console.log(re.toRegExp()); // /[a-z]+/i
console.log(re.getAST()); // AST for /[a-z]+/i
The generator module generates regular expressions from corresponding AST nodes. In Node the module is exposed as regexpTree.generate method.
Example:
const regexpTree = require('regexp-tree');
const re = regexpTree.generate({
type: 'RegExp',
body: {
type: 'Char',
value: 'a',
kind: 'simple',
},
flags: 'i',
});
console.log(re); // '/a/i'
Optimizer transforms your regexp into an optimized version, replacing some sub-expressions with their idiomatic patterns. This might be good for different kinds of minifiers, as well as for regexp machines.
Example:
const regexpTree = require('regexp-tree');
const originalRe = /[a-zA-Z_0-9][A-Z_\da-z]*\e{1,}/;
const optimizedRe = regexpTree
.optimize(originalRe)
.toRegExp();
console.log(optimizedRe); // /\w+e+/
To create an actual RegExp JavaScript object, we can use regexpTree.toRegExp method:
const regexpTree = require('regexp-tree');
const re = regexpTree.toRegExp('/[a-z]/i');
console.log(
re.test('a'), // true
re.test('Z'), // true
);
Below are the AST node types for different regular expressions patterns:
A basic building block, single character. Can be escaped, and be of different kinds.
Basic non-escaped char in a regexp:
z
Node:
{
type: 'Char',
value: 'z',
kind: 'simple'
}
NOTE: to test this from CLI, the char should be in an actual regexp --
/z/.
\z
The same value, escaped flag is added:
{
type: 'Char',
value: 'z',
kind: 'simple',
escaped: true
}
Escaping is mostly used with meta symbols:
// Syntax error
*
\*
OK, node:
{
type: 'Char',
value: '*',
kind: 'simple',
escaped: true
}
A meta character should not be confused with an escaped char.
Example:
\n
Node:
{
type: 'Char',
value: '\\n',
kind: 'meta',
}
Among other meta character are: \f, \r, \n, \t, \v, \0, [\b] (backspace char), \s, \S, \w, \W, \d, \D.
NOTE:
\band\Bare parsed asAssertionnode type, notChar.
A char preceded with \c, e.g. \cx, which stands for CTRL+x:
\cx
Node:
{
type: 'Char',
value: '\\cx',
kind: 'control',
}
A char preceded with \x, followed by a HEX-code, e.g. \x3B (symbol ;):
\x3B
Node:
{
type: 'Char',
value: '\\x3B',
kind: 'hex',
}
Char-code:
\42
Node:
{
type: 'Char',
value: '\\42',
kind: 'decimal',
}
Char-code started with \0, followed by an octal number:
\073
Node:
{
type: 'Char',
value: '\\073',
kind: 'oct',
}
Unicode char started with \u, followed by a hex number:
\u003B
\u{003B}
Node:
{
type: 'Char',
value: '\\u003B',
kind: 'unicode',
}
Character classes define a set of characters. A set may include as simple characters, as well as character ranges. A class can be positive (any from the characters in the class match), or negative (any but the characters from the class match).
A positive character class is defined between [ and ] brackets:
[a*]
A node:
{
type: 'CharacterClass',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: '*',
kind: 'simple'
}
]
}
NOTE: some meta symbols are treated as normal characters in a character class. E.g.
*is not a repetition quantifier, but a simple char.
A negative character class is defined between [^ and ] brackets:
[^ab]
An AST node is the same, just negative property is added:
{
type: 'CharacterClass',
negative: true,
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: 'b',
kind: 'simple'
}
]
}
As mentioned, a character class may also contain ranges of symbols:
[a-z]
A node:
{
type: 'CharacterClass',
expressions: [
{
type: 'ClassRange',
from: {
type: 'Char',
value: 'a',
kind: 'simple'
},
to: {
type: 'Char',
value: 'z',
kind: 'simple'
}
}
]
}
NOTE: it is a syntax error if
tovalue is less thanfromvalue:/[z-a]/.
The range value can be the same for from and to, and the special range - character is treated as a simple character when it stands in a char position:
// from: 'a', to: 'a'
[a-a]
// from: '-', to: '-'
[---]
// simple '-' char:
[-]
// 3 ranges:
[a-zA-Z0-9]+
An alternative (or concatenation) defines a chain of patterns followed one after another:
abc
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: 'b',
kind: 'simple'
},
{
type: 'Char',
value: 'c',
kind: 'simple'
}
]
}
Another examples:
// 'a' with a quantifier, followed by 'b'
a?b
// A group followed by a class:
(ab)[a-z]
The disjunction defines "OR" operation for regexp patterns. It's a binary operation, having left, and right nodes.
Matches a or b:
a|b
A node:
{
type: 'Disjunction',
left: {
type: 'Char',
value: 'a',
kind: 'simple'
},
right: {
type: 'Char',
value: 'b',
kind: 'simple'
}
}
The groups play two roles: they define grouping precedence, and allow to capture needed sub-expressions in case of a capturing group.
"Capturing" means the matched string can be referred later by a user, including in the pattern itself -- by using backreferences.
Char a, and b are grouped, followed by the c char:
(ab)c
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Group',
capturing: true,
expression: {
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: 'b',
kind: 'simple'
}
]
}
},
{
type: 'Char',
value: 'c',
kind: 'simple'
}
]
}
Another example:
// A grouped disjunction of a symbol, and a character class:
(5|[a-z])
NOTE: Named capturing groups are not yet supported by JavaScript RegExp. It is an ECMAScript proposal which is at stage 3 at the moment.
A capturing group can be given a name using the (?<name>...) syntax, for any identifier name.
For example, a regular expressions for a date:
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u
For the group:
(?<foo>x)
We have the following node (the name property with value foo is added):
{
type: 'Group',
capturing: true,
name: 'foo',
expression: {
type: 'Char',
value: 'x',
kind: 'simple'
}
}
Sometimes we don't need to actually capture the matched string from a group. In this case we can use a non-capturing group:
Char a, and b are grouped, but not captured, followed by the c char:
(?:ab)c
The same node, the capturing flag is false:
{
type: 'Alternative',
expressions: [
{
type: 'Group',
capturing: false,
expression: {
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: 'b',
kind: 'simple'
}
]
}
},
{
type: 'Char',
value: 'c',
kind: 'simple'
}
]
}
A capturing group can be referenced in the pattern using notation of an escaped group number.
Matches abab string:
(ab)\1
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Group',
capturing: true,
expression: {
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Char',
value: 'b',
kind: 'simple'
}
]
}
},
{
type: 'Backreference',
kind: 'number',
number: 1,
reference: 1,
}
]
}
A named capturing group can be accessed using \k<name> pattern, and also using a numbered reference.
Matches www:
(?<foo>w)\k<foo>\1
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Group',
capturing: true,
name: 'foo',
expression: {
type: 'Char',
value: 'w',
kind: 'simple'
}
},
{
type: 'Backreference',
kind: 'name',
number: 1,
reference: 'foo'
},
{
type: 'Backreference',
kind: 'number',
number: 1,
reference: 1
}
]
}
Quantifiers specify repetition of a regular expression (or of its part). Below are the quantifiers which wrap a parsed expression into a Repetition node. The quantifier itself can be of different kinds, and has Quantifier node type.
The ? quantifier is short for {0,1}.
a?
Node:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: '?',
greedy: true
}
}
The * quantifier is short for {0,}.
a*
Node:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: '*',
greedy: true
}
}
The + quantifier is short for {1,}.
// Same as `aa*`, or `a{1,}`
a+
Node:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: '+',
greedy: true
}
}
Explicit range-based quantifiers are parsed as follows:
a{3}
The type of the quantifier is Range, and from, and to properties have the same value:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: 'Range',
from: 3,
to: 3,
greedy: true
}
}
An open range doesn't have max value (assuming semantic "more", or Infinity value):
a{3,}
An AST node for such range doesn't contain to property:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: 'Range',
from: 3,
greedy: true
}
}
A closed range has explicit max value: (which syntactically can be the same as min value):
a{3,5}
// Same as a{3}
a{3,3}
An AST node for a closed range:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: 'Range',
from: 3,
to: 5,
greedy: true
}
}
NOTE: it is a syntax error if the max value is less than min value:
/a{3,2}/
If any quantifier is followed by the ?, the quantifier becomes non-greedy.
Example:
a+?
Node:
{
type: 'Repetition',
expression: {
type: 'Char',
value: 'a',
kind: 'simple'
},
quantifier: {
type: 'Quantifier',
kind: '+',
greedy: false
}
}
Other examples:
a??
a*?
a{1}?
a{1,}?
a{1,3}?
Assertions appear as separate AST nodes, however instread of manipulating on the characters themselves, they assert certain conditions of a matching string. Examples: ^ -- beginning of a string (or a line in multiline mode), $ -- end of a string, etc.
The ^ assertion checks whether a scanner is at the beginning of a string (or a line in multiline mode).
In the example below ^ is not a property of the a symbol, but a separate AST node for the assertion. The parsed node is actually an Alternative with two nodes:
^a
The node:
{
type: 'Alternative',
expressions: [
{
type: 'Assertion',
kind: '^'
},
{
type: 'Char',
value: 'a',
kind: 'simple'
}
]
}
Since assertion is a separate node, it may appear anywhere in the matching string. The following regexp is completely valid, and asserts beginning of the string; it'll match an empty string:
^^^^^
The $ assertion is similar to ^, but asserts the end of a string (or a line in a multiline mode):
a$
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Assertion',
kind: '$'
}
]
}
And again, this is a completely valid regexp, and matches an empty string:
^^^^$$$$$
// valid too:
$^
The \b assertion check for word boundary, i.e. the position between a word and a space.
Matches x in x y, but not in xy:
x\b
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'x',
kind: 'simple'
},
{
type: 'Assertion',
kind: '\\b'
}
]
}
The \B is vice-versa checks for non-word boundary. The following example matches x in xy, but not in x y:
x\B
A node is the same:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'x',
kind: 'simple'
},
{
type: 'Assertion',
kind: '\\B'
}
]
}
These assertions check whether a pattern is followed (or not followed for the negative assertion) by another pattern.
Matches a only if it's followed by b:
a(?=b)
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Assertion',
kind: 'Lookahead',
assertion: {
type: 'Char',
value: 'b',
kind: 'simple'
}
}
]
}
Matches a only if it's not followed by b:
a(?!b)
A node is similar, just negative flag is added:
{
type: 'Alternative',
expressions: [
{
type: 'Char',
value: 'a',
kind: 'simple'
},
{
type: 'Assertion',
kind: 'Lookahead',
negative: true,
assertion: {
type: 'Char',
value: 'b',
kind: 'simple'
}
}
]
}
NOTE: Lookbehind assertions are not yet supported by JavaScript RegExp. It is an ECMAScript proposal which is at stage 3 at the moment.
These assertions check whether a pattern is preceded (or not preceded for the negative assertion) by another pattern.
Matches b only if it's preceded by a:
(?<=a)b
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Assertion',
kind: 'Lookbehind',
assertion: {
type: 'Char',
value: 'a',
kind: 'simple'
}
},
{
type: 'Char',
value: 'b',
kind: 'simple'
},
]
}
Matches b only if it's not preceded by a:
(?<!a)b
A node:
{
type: 'Alternative',
expressions: [
{
type: 'Assertion',
kind: 'Lookbehind',
negative: true,
assertion: {
type: 'Char',
value: 'a',
kind: 'simple'
}
},
{
type: 'Char',
value: 'b',
kind: 'simple'
},
]
}
regexpp is a regular expression parser for ECMAScript. It provides functionalities to parse regular expressions into an AST, similar to regexp-tree. However, it does not offer transformation or optimization features.
regexp-parser is another package for parsing regular expressions into an AST. It is simpler and more lightweight compared to regexp-tree, but it lacks the transformation and optimization capabilities.
regexgen generates regular expressions from a set of strings. While it does not parse or transform existing regex patterns, it focuses on creating new regex patterns that match a given set of strings, which is a different use case compared to regexp-tree.
FAQs
Regular Expressions parser in JavaScript
The npm package regexp-tree receives a total of 7,401,927 weekly downloads. As such, regexp-tree popularity was classified as popular.
We found that regexp-tree demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.

Research
/Security News
A five-month operation turned 27 npm packages into durable hosting for browser-run lures that mimic document-sharing portals and Microsoft sign-in, targeting 25 organizations across manufacturing, industrial automation, plastics, and healthcare for credential theft.