Security News
The Risks of Misguided Research in Supply Chain Security
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Engineering
Bradley Meck Farias
February 17, 2023
The JavaScript RegExp
class has a problem. Well, it has a few, but we're just focusing on one specific problem in this post. At Socket, we are currently experiencing a significant issue with RegExp
that we would love to help fix.
The Socket code analysis engine spends a lot of time scanning files and we try to do it quickly so that our customers can be happy with timely results. Regular expressions can often be quite fast and expressive but there is a dark side to their design in JS. In particular, it is hard to know about matching progress or non-progress.
Consider an overly naive check for eval
in some file:
const eval_pattern = /\beval\b/g
// file contains: 'not_eval_at_all()'
for await (const chunk of file) {
eval_pattern.exec(chunk)
}
This actually has a bug and a few problems.
"not_"
, "eval"
, "_at_all()"
. Fixing this means buffering the input."not_"
would never need to be searched again.eval_pattern.lastIndex
in particular.\b
may change depending on what occurs after the match like /a\b/.test('a')
compared to /a\b/.test('ab')
. So if the match is at the end of a chunk, you have to keep buffering after it still; code has gotta keep buffering and may falsely report a full match at a chunk boundary.Additionally, it is possible to have other kinds of bugs like false negatives given different kinds of patterns if using code similar to above:
const safe_str = ''
for (const chunk of ['BAD_', 'WORD']) {
safe_str += chunk.replaceAll(/BAD_WORD/g, '')
}
// safe_str includes BAD_WORD, oops!
That is why we are proposing a feature to TC39 and looking for a champion to make a solution to give incremental progress to RegExp matching. You can look at our proposal and see that it actually has a variety of scenarios to account for. In particular: lookbehind, lookahead, and aggregation of quantifiers are important.
Our hope is with the goals listed in the proposal to allow for less wasted idle time on I/O, less duplicated scanning of strings, and reducing memory pressure. If done right, it might even be possible to persist progress even if the JS VM is spun down! This is a very exciting thing that may greatly improve text processing in JS.
Subscribe to our newsletter
Get notified when we publish new security blog posts!
Try it now
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.