Research
Security News
Malicious PyPI Package ‘pycord-self’ Targets Discord Developers with Token Theft and Backdoor Exploit
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
union-replacer
Advanced tools
One-pass String.prototype.replace-like processor with multiple regexps and replacements
UnionReplacer provides one-pass global search and replace functionality
using multiple regular expressions and corresponging replacements.
Otherwise the behavior matches String.prototype.replace(regexp, newSubstr|function)
.
In browsers:
<script src="https://unpkg.com/union-replacer/dist/union-replacer.umd.js" />
Using npm:
npm install union-replacer
In Node.js:
const UnionReplacer = require('union-replacer');
With TypeScript:
// with esModuleInterop enabled in tsconfig (recommended):
import UnionReplacer from 'union-replacer';
// without esModuleInterop enabled in tsconfig:
import * as UnionReplacer from 'union-replacer';
// regardless esModuleInterop setting:
import UnionReplacer = require('union-replacer');
replacer = new UnionReplacer(replace_pairs, [flags])
newStr = replacer.replace(str)
replace_pairs
: array of [regexp, replacement]
arrays, where
regexp
: particular regexp element in unioned regexp. Its eventual flags are ignored.replacement
corresponds with String.prototype.replace
:
function
: see
Specifying a function as a parameter.
As of 1.1.0, the function can be called with extended arguments, see JSDoc
for more info.newSubstr
: see
Specifying a string as a parameter.flags
: regular expression flags to be set on the main underlying regexp, defaults to gm
.addReplacement()
method, see
#4 for details.const htmlEscapes = [
[/</, '<'],
[/>/, '>'],
[/"/, '"'],
// not affected by the previous replacements producing '&'
[/&/, '&']
];
const htmlEscaper = new UnionReplacer(htmlEscapes);
const toBeHtmlEscaped = '<script>alert("inject & control")</script>';
console.log(htmlEscaper.replace(toBeHtmlEscaped));
Output:
<script>alert("inject & control")</script>
Highlighting Markdown special characters while preserving code blocks and spans. Only a subset of Markdown syntax is supported for simplicity.
const mdHighlighter = new UnionReplacer([
// opening fence = at least three backticks
// closing fence = opening fence or longer
// regexp backreferences are ideal to match this
[/^(`{3,}).*\n([\s\S]*?)(^\1`*\s*?$|\Z)/, (match, fence1, pre, fence2) => {
let block = `<b>${fence1}</b><br />\n`
block += `<pre>${htmlEscaper.replace(pre)}</pre><br />\n`
block += `<b>${fence2}</b>`
return block;
}],
// Code spans are delimited by two same-length backtick strings.
// Note that backreferences within the regexp are numbered as usual,
// i.e. \1 still means first capturing group.
// Union replacer renumbers them when composing the final internal regexp.
[/(^|[^`])(`+)(?!`)(.*?[^`]\2)(?!`)/, (match, lead, delim, code) => {
return `${htmlEscaper.replace(lead)}<code>${htmlEscaper.replace(code)}</code>`
}],
// Subsequent replaces are performed only outside code blocks and spans.
[/[*~=+_-`]+/, '<b>$&</b>'],
[/\n/, '<br />\n']
// HTML entity-like strings would be interpreted too
].concat(htmlEscapes));
const toBeMarkdownHighlighted = '\
**Markdown** code to be "highlighted"\n\
with special care to fenced code blocks:\n\
````\n\
_Markdown_ within fenced code blocks is not *processed*:\n\
```\n\
Even embedded "fence strings" work well with **UnionEscaper**\n\
```\n\
````\n\
*CommonMark is sweet & cool.*';
console.log(mdHighlighter.replace(toBeMarkdownHighlighted));
Produces:
<b>**</b>Markdown<b>**</b> code to be "highlighted"<br />
with special care to fenced code blocks:<br />
<b>````</b><br />
<pre>_Markdown_ within fenced code blocks is not *processed*:
```
Even embedded "fence strings" work well with **UnionEscaper**
```
</pre><br />
<b>````</b><br />
<b>*</b>CommonMark is sweet & cool.<b>*</b>
The code below escapes text, so that special Markdown sequences are protected from interpreting. Two considerations are applied:
const mdEscaper = new UnionReplacer([
// Keep urls untouched (simplified for demonstration purposes).
// The same should apply for GFM email autolinks.
[/\bhttps?:\/\/(?!\.)(?:\.?[\w-]+)+(?:[^\s<]*?)(?=[?!.,:*~]*(?:\s|$))/, '$&'],
// global backslash escapes
[/[\\*_[\]`&<>]/, '\\$&'],
// backslash-escape at line start
[/^(?:~~~|=+)/, '\\$&'],
// strike-through w/o lookbehinds
[/~+/, m => m.length == 2 ? `\\${m}` : m],
// backslash-escape at line start if followed by space
[/^(?:[-+]|#{1,6})(?=\s)/, '\\$&'],
// backslash-escape the dot to supress ordered list
[/^(\d+)\.(?=\s)/, '$1\\. ']
]);
const toBeMarkdownEscaped = '\
A five-*starred* escaper:\n\
1. Would preserve _underscored_ in the http://example.com/_underscored_/ URL.\n\
2. Would also preserve backspaces (\\) in http://example.com/\\_underscored\\_/.';
console.log(mdEscaper.replace(toBeMarkdownEscaped));
Produces:
A five-\*starred\* escaper:
1\. Would preserve \_underscored\_ in the http://example.com/_underscored_/ URL.
2\. Would also preserve backspaces (\\) in http://example.com/\_underscored\_/.
The library has been created to support complex text processing in situations when certain configurability is desired. The initial need occured when using the Turndown project. It is a an excellent and flexible tool, but we faced several hard-to-solve difficulties with escaping special sequences.
UnionReplacer
When text processing with several patterns is required, there are two approaches:
// No UnionEscaper
return unsafe
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
The issue is not only the performance. Since the subsequent replacements are
performed on a partially-processed result, the developer has to ensure that
no intermediate steps affect the processing. E.g.:
// No UnionEscaper
return 'a "tricky" task'
.replace(/"/g, '"')
.replace(/&/g, '&')
// desired: 'a "tricky" task'
// actual: 'a &quot;tricky&quot; task'
So 'a "tricky" task' became 'a "tricky" task'. This
particular task is manageable with carefuly choosing the processing order.
But when the processing is context-dependent, iterative processing becomes
impossible.// No UnionEscaper
const mdHighlightRe = /(^(`{3,}).*\n([\s\S]*?)(^\2`*\s*?$|\Z))|((^|[^`])(`+)(?!`)(.*?[^`]\7)(?!`))|([*~=+_-`]+)|(\n)|(<)|(>)|(")|(&)/gm
return md.replace(mdHighlightRe,
(match, fenced, fence1, pre, fence2, codespan, lead, delim, code, special, nl, lt, gt, quot, amp) => {
if (fenced) {
let block = `<b>${fence1}</b><br />\n`
block += `<pre>${htmlEscaper.replace(pre)}</pre><br />\n`
block += `<b>${fence2}</b>`
return block;
} else if (codespan) {
return `${myHtmlEscape(lead)}<code>${myHtmlEscape.replace(code)}</code>`
} else if (special) {
return `<b>${special}</b>`
} else if (nl) {
return '<br />\n'
} // else etc.
});
UnionReplacer
Iterative processing is simple and well-readable, though it is very limited. Developers are often trading simplicity for bugs.
While regexp with alternations is the way to go, we wanted to provide an easy way to build it, use it and even allow its variable composition in runtime.
Instead of using a single long regular regexp, developers can use an array
of individual smaller regexps, which will be merged together by the
UnionReplacer
class. Its usage is as simple as in the iterative processing
approach.
String.prototype.replace()
, namely:
// The order of replaces is important
const replacer1 = new UnionReplacer([
[/foo/, '(FOO)'], // when foo is matched, subsequent parts are not examined
[/.+/, '(nonfoo)'] // no mather that this also matches foo
]);
// replacer1 still eats the rest of the inputwhen foo is not matched
const replacer2 = new UnionReplacer([
[/foo/, '(FOO)'],
[/.+?(?=foo|$)/, '(nonfoo)'] // non-greedy match up to next foo or line end
]);
const text = 'foobarfoobaz'
replacer1.replace(text); // (FOO)(nonfoo)
replacer2.replace(text); // (FOO)(nonfoo)(FOO)(nonfoo)
Most important, the code was written with performance in mind.
In runtime, UnionReplacer
performs one-pass processing driven by
a single native regexp.
The replacements are always done as an arrow function internally, even for
string replacements. The eventual performance impact of this would be
engine-dependent.
Feel free to benchmark the library and please share the results.
ES2018 named capture groups work with the following limitations:
Not supported. The syntax is the same as backreferences (\1
) and
their interpretation is input-dependent even in native regexps.
It is better to avoid them completely and use hex escapes instead (\xNN
).
Any flags in paticular search regexps are ignored.
The resulting replacement has always the flags from constructor call,
which defaults to global (g
) and multiline (m
).
FAQs
One-pass String.prototype.replace-like processor with multiple regexps and replacements
The npm package union-replacer receives a total of 23,099 weekly downloads. As such, union-replacer popularity was classified as popular.
We found that union-replacer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.