remix
regular expression alternation for tokenizers
What is this?
This allows joining regular expressions mainly for the purpose of constructing
tokenizers.
Matching
This library emulates to some degree the new sticky /y
flag. The sticky flag
has 3 basic effects on a regular expression.
- Implicit
/g
- Not looking past
lastIndex
- Treating the string as if it starts at
lastIndex
.
The first one if obvious, we simply default to /g
. The second one
is a little more complex, but we manage. If a match looks forward, the
exec()
method will reject the match.
Only the last one do we not support. It would require making a partial copy
of the string per match which would have a significant overhead with little
or no gain. Only when then /y
flag is widely adopted will this be possible.
Add Specification
var re = new ReMix();
re.add(/regexp/);
re.add(function () {
return {
name: "foo"
spec: [
{
baz: 'foo{eol}$'
},
function () {
return /bar/
}
]
};
});
Combining
Regular expressions are combined if their flags, which are not in
defaultFlags are the same. In other words, defaultFlags
are ignored for the purpose of comparing.
var re = new Re([/foo/, /bar/g]);
console.log(re.compile());
var re = new Re([/foo/, /bar/i]);
console.log(re.compile());
Naming
Each regular expression can be named and have sub-named expressions
which are tracked by namespaces with configurable delimiters. ReMix
object can also compose other ReMix
objects which are also namespaced into the
current level.
Namespaces are returned as the second element in the match array returned by
exec()
.
var re = new ReMix('foobar', /foo(bar)/, { bar: /bar/, baz: { boo: /boo/ } });
var str = "foobarboobarfoobar", match;
match = re.exec(str);
match = re.exec(str);
match = re.exec(str);
match = re.exec(str);
match = re.exec(str);
var a1 = new ReMix('a1'),
a2 = new ReMix('a2'),
str = "foobar";
a1.options({nsDelimiter: '/'});
a1.add({foo: /foo/});
a2.add({bar: /bar/});
a1.add(a2);
console.log(a1.exec(str));
console.log(a1.exec(str));
Templates
Templates are a very simple way to compose regular expressesion. Templates
can be used in the place of regular expressions. You can register new templates
or use the existing predefined templates. Here is a list:
-
hspace - Horizontal space, including unicode horizontal space code points
-
noHspace - hspace negated
-
vspace - Vertical space character class
-
number - General number match, includes all Javascript supported number formated
-
noVspace - Negated vspace character class
-
space - /\s/
-
noSpace - /\S/
-
word - /\w/
-
noWord - /\W/
-
any - Match any one character, doesn't depend on RegExp flags
-
eol - Match EOL on any platform
-
notEol- Not EOL character class
-
end - END of string or line depending on RegExp flags
-
begin - Beginning of line or string depending on RegExp flags
Registering a template has two formats:
ReMix.register('eol', /(?:\r\n?|\n|\f)/);
ReMix.register('emptyLines', '{eol}{eol+}');
ReMix.register({
cMultiComment: /\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\//,
cLineComment: "//{notEol*}{eol}"
});
Example of how templates integrate with ReMix.
ReMix.register('matchFoo', /foo/);
var remix = new ReMix('foo', 'matchFoo');
var remix = new ReMix('bar', '{matchFoo*}|bar');
ReMix.register('matchFooOrBar', '{matchFoo}|bar');
var remix = new ReMix('bax', '\\s*{matchFooOrBar+}\\s*');
BUGS
Please report bug here.
This is BETA software. Please report bugs or submit PR's.
Although I know of no bugs, the current test coverage is not complete.
TODO
- Complete test coverage.
- Add browser tests.
- Add travis.
- Add badges.
- Better, less contrived examples.
- More information in this README.md.
SEE ALSO
ReMix Class API Documentation
LICENSE
Copyright (C) 2014 Scott Beck, all rights reserved
Licensed under the MIT license