Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Grammar2Regex converts context-free grammars to regular expressions.
Left- and right-linear grammars are always converted precisely.
Furthermore, some grammars with a regular language but a more general shape can also be converted.
For example, a rule <N> ::= a<A><N>
is not right-linear due to the occurrence of <A>
; but if <A>
does not reach <N>
and everything else is regular, we obtain a precise regular expression by computing the regular expression for <A>
and inserting it into the rule for <N>
.
For grammars with a non-regular language or a suboptimal shape, we "unwind" problematic expansion elements. This step loses information, but enables us to compute under-approximating regular expressions for arbitrary context-free grammars. Grammar2Regex returns expressions either in a custom ADT format suitable for further processing or as ReRef objects of the z3 SMT solver.
Consider the following grammar of XML attribute or tag name identifiers:
import string
from typing import Dict, List
from fuzzingbook.Grammars import srange
xml_id_grammar: Dict[str, List[str]] = {
"<start>": ["<id>"],
"<id>": [
"<id-no-prefix>",
"<id-with-prefix>"
],
"<id-no-prefix>": [
"<id-start-char>",
"<id-start-char><id-chars>",
],
"<id-with-prefix>": ["<id-no-prefix>:<id-no-prefix>"],
"<id-start-char>": srange("_" + string.ascii_letters),
"<id-chars>": ["<id-char>", "<id-char><id-chars>"],
"<id-char>": ["<id-start-char>"] + srange("-." + string.digits),
}
Our chosen grammar format is the one from the Fuzzing Book.
This grammar can be precisely converted into a regular expression:
from grammar_to_regex.cfg2regex import RegexConverter
converter = RegexConverter(xml_id_grammar, compress_unions=True)
print(converter.to_regex('<start>', convert_to_z3=False))
which yields an object of type grammar_to_regex.regex.Regex
, in pretty print:
((["-" .. "."] | ["0" .. "9"] | ["A" .. "Z"] | "_" | ["a" .. "z"])* • (["-" .. "."] | ["0" .. "9"] | ["A" .. "Z"] | "_" | ["a" .. "z"]))
You can also obtain an output in the regular expression format of the z3 SMT solver
if you need this for further processing (simply omit the convert_to_z3=False
or set that parameter to
True
).
Grammar2Regex requires Python 3.10.
To install Grammar2Regex, run
pip install grammar_to_regex
To install it locally for testing, check this repository out and run
pip install -e .[test]
This project is licensed under the GNU GENERAL PUBLIC LICENSE v3. It is maintained by Dominic Steinhöfel.
FAQs
Conversion of a (sub) grammar to (approximated) regular expressions.
We found that grammar-to-regex demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.