
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
mo-parsing
Advanced tools
A fork of pyparsing for faster parsing
This is a pypi package
pip install mo-parsing
This module allows you to define a PEG parser using predefined patterns and Python operators. Here is an example
>>> from mo_parsing import Word
>>> from mo_parsing.utils import alphas
>>>
>>> greet = Word(alphas)("greeting") + "," + Word(alphas)("person") + "!"
>>> result = greet.parse_string("Hello, World!")
The result can be accessed as a nested list
>>> list(result)
['Hello', ',', 'World', '!']
The result can also be accessed as a dictionary
>>> dict(result)
{'greeting': 'Hello', 'person': 'World'}
Read the pyparsing documentation for more
Whitespace ContextThe mo_parsing.whitespaces.CURRENT is used during parser creation: It is effectively defines what "whitespace" to skip during parsing, with additional features to simplify the language definition. You declare "standard" Whitespace like so:
with Whitespace() as whitespace:
# PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are "whitespace")
If you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:
whitespace = Whitespace().use()
# PUT YOUR LANGUAGE DEFINITION HERE
whitespace.release()
The whitespace can be used to set global parsing parameters, like
set_whitespace() - set the ignored characters (default: "\t\n ")add_ignore() - include whole patterns that are ignored (like comments)set_literal() - Set the definition for what Literal() meansset_keyword_chars() - For default Keyword() (important for defining word boundary)The results of parsing are in ParseResults and are in the form of an n-ary tree; with the children found in ParseResults.tokens. Each ParseResult.type points to the ParserElement that made it. In general, if you want to get fancy with post processing (or in a parse_action), you will be required to navigate the raw tokens to generate a final result
There are some convenience methods;
__iter__() - allows you to iterate through parse results in depth first search. Empty results are skipped, and Grouped results are treated as atoms (which can be further iterated if required)name is a convenient property for ParseResults.type.token_name__getitem__() - allows you to jump into the parse tree to the given name. This is blocked by any names found inside Grouped results (because groups are considered atoms).Parse actions are methods that run after a ParserElement found a match.
(tokens, index, string) order (the opposite of pyparsing)None then the result is the original tokensParseResult with same type as tokens.ParseResult then it is accepted even if is belongs to some other patterninteger = Word("0123456789").add_parse_action(lambda t, i, s: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
For slightly shorter specification, you may use the / operator and only parameters you need:
integer = Word("0123456789") / (lambda t: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
The PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard. To look deeper into what the parser is doing use the Debugger:
with Debugger():
expr.parse_string("my new language")
The debugger will print out details of what's happening
This should help to isolate the exact position your grammar is failing.
mo-parsing can parse and generate regular expressions. ParserElement has a __regex__() function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing. The Regex class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.
This fork was originally created to support faster parsing for mo-sql-parsing. Since then it has deviated sufficiently to be it's own collection of parser specification functions. Here are the differences:
Whitespace, which controls parsing context and whitespace. It replaces the whitespace modifying methods of pyparsing*") could be used in pyparsing to indicate multi-values are expected; this is not allowed in mo-parsing: all values are multi-valuesexpr.add_parse_action(action) creates a new ParserElement, so must be assigned to variable or it is lost. This is the biggest source of bugs when converting from pyparsingFaster Parsing
If you plan to extend or enhance this code, please see the README in the tests directory
FAQs
Another PEG Parsing Tool
We found that mo-parsing demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.