Security News
vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU
vlt introduced its new package manager and a serverless registry this week, innovating in a space where npm has stagnated.
Bindings for RE2: fast, safe alternative to backtracking regular expression engines.
The re2 npm package is a wrapper around Google's RE2 regular expression library, which is designed to be fast and safe. It provides a way to perform regular expression operations without the risk of catastrophic backtracking, which can occur with some other regular expression engines.
Basic Matching
This feature allows you to perform basic matching operations using regular expressions. The code sample demonstrates how to create a new RE2 instance with a pattern and test a string against it.
const RE2 = require('re2');
const re = new RE2('hello');
console.log(re.test('hello world')); // true
Capturing Groups
This feature allows you to use capturing groups in your regular expressions. The code sample shows how to capture groups of digits separated by a hyphen and access the captured groups.
const RE2 = require('re2');
const re = new RE2('(\d+)-(\d+)');
const match = re.exec('123-456');
console.log(match[1]); // 123
console.log(match[2]); // 456
Global Matching
This feature allows you to perform global matching to find all occurrences of a pattern in a string. The code sample demonstrates how to find all sequences of digits in a string.
const RE2 = require('re2');
const re = new RE2('\d+', 'g');
const matches = '123 456 789'.match(re);
console.log(matches); // ['123', '456', '789']
Replacing
This feature allows you to replace parts of a string that match a pattern with a replacement string. The code sample shows how to replace 'world' with 'RE2' in a given string.
const RE2 = require('re2');
const re = new RE2('world');
const result = 'hello world'.replace(re, 'RE2');
console.log(result); // 'hello RE2'
The 'regexp' package provides a simple interface for working with regular expressions in JavaScript. It is similar to re2 but does not offer the same level of performance and safety guarantees against catastrophic backtracking.
The 'xregexp' package extends JavaScript's native RegExp with additional features and syntax. It offers more functionality than re2 but may not be as performant or safe in terms of avoiding catastrophic backtracking.
The 'pcre-to-regexp' package allows you to convert Perl-compatible regular expressions (PCRE) to JavaScript RegExp objects. While it provides compatibility with PCRE syntax, it does not offer the same performance and safety benefits as re2.
This project provides bindings for RE2: fast, safe alternative to backtracking regular expression engines written by Russ Cox. To learn more about RE2, start with an overview Regular Expression Matching in the Wild. More resources can be found at his Implementing Regular Expressions page.
RE2
's regular expression language is almost a superset of what is provided by RegExp
(see Syntax),
but it lacks two features: backreferences and lookahead assertions. See below for more details.
RE2
always works in the Unicode mode, which means that all matches that use character codes are interpret as Unicode code points, not as binary values of UTF-16.
See RE2.unicodeWarningLevel
below for more details.
RE2
object emulates standard RegExp
making it a practical drop-in replacement in most cases.
RE2
is extended to provide String
-based regular expression methods as well. To help to convert
RegExp
objects to RE2
its constructor can take RegExp
directly honoring all properties.
It can work with node.js buffers directly reducing overhead on recoding and copying characters, and making processing/parsing long files fast.
All documentation can be found in this README and in the wiki.
The built-in Node.js regular expression engine can run in exponential time with a special combination:
This can lead to what is known as a Regular Expression Denial of Service (ReDoS). To tell if your regular expressions are vulnerable, you might try the one of these projects:
However, neither project is perfect.
node-re2 can protect your Node.js application from ReDoS.
node-re2 makes vulnerable regular expression patterns safe by evaluating them in RE2
instead of the built-in Node.js regex engine.
RE2
object can be created just like RegExp
:
Supported properties:
re2.lastIndex
re2.global
re2.ignoreCase
re2.multiline
re2.dotAll
— since 1.17.6.re2.unicode
RE2
engine always works in the Unicode mode. See details below.re2.sticky
— since 1.7.0.re2.hasIndices
— since 1.19.0.re2.source
re2.flags
Supported methods:
Starting with 1.6.0 following well-known symbol-based methods are supported (see Symbols):
re2[Symbol.match](str)
re2[Symbol.matchAll](str)
— since 1.17.5.re2[Symbol.search](str)
re2[Symbol.replace](str, newSubStr|function)
re2[Symbol.split](str[, limit])
It allows to use RE2
instances on strings directly, just like RegExp
instances:
var re = new RE2("1");
"213".match(re); // [ '1', index: 1, input: '213' ]
"213".search(re); // 1
"213".replace(re, "+"); // 2+3
"213".split(re); // [ '2', '3' ]
Array.from("2131".matchAll(re)); // returns a generator!
// [['1', index: 1, input: '2131'], ['1', index: 3, input: '2131']]
Starting with 1.8.0 named groups are supported.
RE2
object can be created from a regular expression:
var re1 = new RE2(/ab*/ig); // from a RegExp object
var re2 = new RE2(re1); // from another RE2 object
String
methodsStandard String
defines four more methods that can use regular expressions. RE2
provides them as methods
exchanging positions of a string, and a regular expression:
re2.match(str)
re2.replace(str, newSubStr|function)
re2.search(str)
re2.split(str[, limit])
Starting with 1.6.0, these methods added as well-known symbol-based methods to be used transparently with ES6 string/regex machinery.
Buffer
supportIn order to support Buffer
directly, most methods can accept buffers instead of strings. It speeds up all operations.
Following signatures are supported:
re2.exec(buf)
re2.test(buf)
re2.match(buf)
re2.search(buf)
re2.split(buf[, limit])
re2.replace(buf, replacer)
Differences with their string-based versions:
Buffer
objects, even in composite objects. A buffer can be converted to a string with
buf.toString()
.When re2.replace()
is used with a replacer function, the replacer can return a buffer, or a string. But all arguments
(except for an input object) will be strings, and an offset will be in characters. If you prefer to deal
with buffers and byte offsets in a replacer function, set a property useBuffers
to true
on the function:
function strReplacer(match, offset, input) {
// typeof match == "string"
return "<= " + offset + " characters|";
}
RE2("б").replace("абв", strReplacer);
// "а<= 1 characters|в"
function bufReplacer(match, offset, input) {
// typeof match == "string"
return "<= " + offset + " bytes|";
}
bufReplacer.useBuffers = true;
RE2("б").replace("абв", bufReplacer);
// "а<= 2 bytes|в"
This feature works for string and buffer inputs. If a buffer was used as an input, its output will be returned as a buffer too, otherwise a string will be returned.
Two functions to calculate string sizes between
UTF-8 and
UTF-16 are exposed on RE2
:
RE2.getUtf8Length(str)
— calculates a buffer size in bytes to encode a UTF-16 string as
a UTF-8 buffer.RE2.getUtf16Length(buf)
— calculates a string size in characters to encode a UTF-8 buffer as
a UTF-16 string.JavaScript supports UCS-2 strings with 16-bit characters, while node.js 0.11 supports full UTF-16 as a default string.
internalSource
Starting 1.8.0 property source
emulates the same property of RegExp
, meaning that it can be used to create an identical RE2
or RegExp
instance. Sometimes, for troubleshooting purposes, a user wants to inspect a RE2
translated source. It is available as a read-only property called internalSource
.
RE2
engine always works in the Unicode mode. In most cases either there is no difference or the Unicode mode is actually preferred. But sometimes a user wants a tight control over their regular expressions. For those cases, there is a static string property RE2.unicodeWarningLevel
.
Regular expressions in the Unicode mode work as usual. But if a regular expression lacks the Unicode flag, it is always added silently.
const x = /./;
x.flags; // ''
const y = new RE2(x);
y.flags; // 'u'
In the latter case RE2
can do following actions depending on RE2.unicodeWarningLevel
:
'nothing'
(the default): no warnings or notifications of any kind, a regular expression will be created with 'u'
flag.'warnOnce'
: warns exactly once the very first time, a regular expression will be created with 'u'
flag.
RE2
will warn once again.'warn'
: warns every time, a regular expression will be created with 'u'
flag.'throw'
: throws a SyntaxError
every time.Warnings and exceptions help to audit an application for stray non-Unicode regular expressions.
Installation:
npm install --save re2
While the project is known to work with other package managers, it is not guaranteed nor tested. For example, yarn is known to fail in some scenarios (see this Wiki article).
When installing re2 the install script attempts to download a prebuilt artifact for your system from the Github releases. The download location can be overridden by setting the RE2_DOWNLOAD_MIRROR
environment variable as seen in the install script.
If all attempts to download the prebuilt artifact for your system fails the script attempts to built re2 locally on your machine using node-gyp.
It is used just like a RegExp
object.
var RE2 = require("re2");
// with default flags
var re = new RE2("a(b*)");
var result = re.exec("abbc");
console.log(result[0]); // "abb"
console.log(result[1]); // "bb"
result = re.exec("aBbC");
console.log(result[0]); // "a"
console.log(result[1]); // ""
// with explicit flags
re = new RE2("a(b*)", "i");
result = re.exec("aBbC");
console.log(result[0]); // "aBb"
console.log(result[1]); // "Bb"
// from regular expression object
var regexp = new RegExp("a(b*)", "i");
re = new RE2(regexp);
result = re.exec("aBbC");
console.log(result[0]); // "aBb"
console.log(result[1]); // "Bb"
// from regular expression literal
re = new RE2(/a(b*)/i);
result = re.exec("aBbC");
console.log(result[0]); // "aBb"
console.log(result[1]); // "Bb"
// from another RE2 object
var rex = new RE2(re);
result = rex.exec("aBbC");
console.log(result[0]); // "aBb"
console.log(result[1]); // "Bb"
// shortcut
result = new RE2("ab*").exec("abba");
// factory
result = RE2("ab*").exec("abba");
RE2
consciously avoids any regular expression features that require worst-case exponential time to evaluate.
These features are essentially those that describe a Context-Free Language (CFL) rather than a Regular Expression,
and are extensions to the traditional regular expression language because some people don't know when enough is enough.
The most noteworthy missing features are backreferences and lookahead assertions.
If your application uses these features, you should continue to use RegExp
.
But since these features are fundamentally vulnerable to
ReDoS,
you should strongly consider replacing them.
RE2
will throw a SyntaxError
if you try to declare a regular expression using these features.
If you are evaluating an externally-provided regular expression, wrap your RE2
declarations in a try-catch block. It allows to use RegExp
, when RE2
misses a feature:
var re = /(a)+(b)*/;
try {
re = new RE2(re);
// use RE2 as a drop-in replacement
} catch (e) {
// suppress an error, and use
// the original RegExp
}
var result = re.exec(sample);
In addition to these missing features, RE2
also behaves somewhat differently from the built-in regular expression engine in corner cases.
RE2
doesn't support backreferences, which are numbered references to previously
matched groups, like so: \1
, \2
, and so on. Example of backrefrences:
/(cat|dog)\1/.test("catcat"); // true
/(cat|dog)\1/.test("dogdog"); // true
/(cat|dog)\1/.test("catdog"); // false
/(cat|dog)\1/.test("dogcat"); // false
RE2
doesn't support lookahead assertions, which are ways to allow a matching dependent on subsequent contents.
/abc(?=def)/; // match abc only if it is followed by def
/abc(?!def)/; // match abc only if it is not followed by def
RE2
and the built-in regex engines disagree a bit. Before you switch to RE2
, verify that your regular expressions continue to work as expected. They should do so in the vast majority of cases.
Here is an example of a case where they may not:
var RE2 = require("../re2");
var pattern = '(?:(a)|(b)|(c))+';
var built_in = new RegExp(pattern);
var re2 = new RE2(pattern);
var input = 'abc';
var bi_res = built_in.exec(input);
var re2_res = re2.exec(input);
console.log('bi_res: ' + bi_res); // prints: bi_res: abc,,,c
console.log('re2_res : ' + re2_res); // prints: re2_res : abc,a,b,c
RE2
always works in the Unicode mode. See RE2.unicodeWarningLevel
above for more details on how to control warnings about this feature.
absail-cpp
files that manifested itself on ARM Alpine. Thx, Laura Hausmann.node-gyp
.abseil-cpp
and required the adaptation work. Thx, Stefano Rivera.d
flag when lastIndex
is non zero. Bugfix: the match result. Thx, teebu.hasIndices
AKA the d
flag. Thx, teebu.dotAll
. Thx Michael Kriese.matchAll()
(thx, ThePendulum and David Sichau).lastIndex
for U+10000 - U+10FFFF UTF characters. Thx, omg.node2nix
-related problem (thx malte-v).RE2_DOWNLOAD_MIRROR
environment variable for precompiled artifact download during installation.linux-musl
target for precompiled images (thx Uzlopak).toString()
uses source
now, updated deps.The rest can be consulted in the project's wiki Release history.
BSD
FAQs
Bindings for RE2: fast, safe alternative to backtracking regular expression engines.
The npm package re2 receives a total of 654,633 weekly downloads. As such, re2 popularity was classified as popular.
We found that re2 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
vlt introduced its new package manager and a serverless registry this week, innovating in a space where npm has stagnated.
Security News
Research
The Socket Research Team uncovered a malicious Python package typosquatting the popular 'fabric' SSH library, silently exfiltrating AWS credentials from unsuspecting developers.
Security News
At its inaugural meeting, the JSR Working Group outlined plans for an open governance model and a roadmap to enhance JavaScript package management.