Security News
Node.js EOL Versions CVE Dubbed the "Worst CVE of the Year" by Security Experts
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
mini-parse
Advanced tools
**MiniParse** is a small TypeScript parser combinator library with an efficient regex based lexer.
MiniParse is a small TypeScript parser combinator library with an efficient regex based lexer.
or()
, repeat()
, and seq()
.
It's just TypeScript so it's easy to mix with your existing code,
IDE, test frameworks, etc.Typically you can write a parser by mixing combinator functions from the library.
Here's a parser for a simple sum expression:
export const simpleSum = seq(num, or("+", "-"), num);
Here's a parser for nested block comments:
export const blockComment: Parser<void> = seq(
"/*",
repeat(or(() => blockComment, anyNot("*/"))),
req("*/")
);
The example above uses the combinators:
seq()
, repeat()
, or()
, anyNot()
, and req()
.
More combinators are available and documented in ParserCombinator.ts
To any combinator that accepts a parser as an argument, you can pass:
Parsing relies on a lexer to divide the source string into parts, called tokens. (You could produce a token per character, but that would be comparatively slow)
Here's an example token matcher to identify three kinds of tokens for parsing simple arithmetic expressions.
export const simpleTokens = tokenMatcher({
number: /\d+/,
symbol: matchOneOf("( ) ^ + - * /"),
ws: /\s+/,
});
To create a lexer that walks over a source text producing tokens, use:
const text = "3 + 4";
const lexer = matchingLexer(text, simpleTokens);
By default, the lexer will skip over whitespace tokens that you designate with
the name 'ws'. You can configure whether and which tokens to skip statically by
passing the set of names skip when you construct the matchingLexer
, or
dynamically while parsing by using the tokensIgnore()
combinator.
To run a parser, just give it a lexer attached to your source text.
const result = simpleSum.parse({ lexer });
The result will contain the combined results of the parsers, in this case
["3", "+", "4"]
.
Typically, it's convenient to use .map()
to select the relevant parts from a
successful parse and do a bit of format conversion.
This parser will return a number rather than a string:
const int = num.map((r) => parseInt(r.value));
Here's an example that even does some computation, and returns a numeric sum or
difference of the whole expression. It parses the same text as simpleSum
above, but converts to numbers and then adds or subtracts.
// return the numeric sum, rather than a sequence of strings
export const sumResults = seq(int, or("+", "-"), int).map((r) => {
const [a, op, b] = r.value;
return op === "+" ? a + b : a - b;
});
Note that .map()
is only called on successful parses of the mapped expression,
if the expression fails, the parser will backtrack and try any alternatives in
the grammar and .map()
will not be called on the failed part of the parse.
For larger parsers, you'll typically convert the parsed text into an intermediate form, sometimes called an Abstract Syntax Tree.
.parse()
allows you to pass an application specific data structure that's
visible in .map()
for convenience in building the abstract syntax tree with
each successfully parsed element.
type ASTElem = BinOpElem;
interface BinOpElem {
kind: "binOp";
left: number | BinOpElem;
right: number | BinOpElem;
op: "+" | "-";
}
export const sumElem = seq(int, or("+", "-"), int).map((r) => {
const [a, op, b] = r.value;
const binOpElem: BinOpElem = {
kind: "binOp",
left: a,
right: b,
op: op as "+" | "-",
};
r.app.state.push(binOpElem);
});
For complicated nested or repeating parsers,
it's convenient to assign names to particular results of interest.
You can use a Symbol
or a string to name a result using the .named()
method on all parsers.
Multiple results with the same name are accumulated into an array.
const op = or("+", "-");
export const namedSum = seq(
int,
repeat(seq(op, int).named("opRights")) // accumulate an array of [op, int] pairs
).map((r) => {
const { opRights } = r.named;
const left = r.value[0];
if (!opRights) return left;
return opRights.reduce((acc, opRight) => {
const [op, right] = opRight;
return op === "+" ? acc + right : acc - right;
}, left);
});
For debugging your grammar, it's useful to debug your grammar in pieces. One of the nice features of parser combinators is that every part of the grammar is independently testable.
To print out the progress of parsing:
enableTracing()
to turn on the tracing facility (normally off and
removed from prod builds).trace(opts?)
on any Parser. See TraceOptions
for options
controlling trace levels..traceName()
or
setTraceName()
.
.traceName()
on any parser to set the trace name for debugging.setTraceName()
protected by a tracing
global
and the javascript bundler will remove the code in production builds to
save a few bytes.if (tracing) {
const names: Record<string, Parser<unknown>> = {
fnCall,
fnParam,
fnParamList,
fnDecl,
};
Object.entries(names).forEach(([name, parser]) => {
setTraceName(parser, name);
});
Calculator - classic PEG style parser in MiniParse syntax.
Calculator with Results calculator example parser with inline calculation of results.
WGSL-D parsing selected parts of the WebGPU
WGSL shader language
along with #import
and #export
extensions.
Sometimes it's nice to let the grammar choose a different tokenizer to parse different sections of the source text. For example, to parse a programming language with quotes, you'll probably want a different tokenizer for the text inside of quotes:
const quote = seq('"', tokens(quoteTokens, repeat(nonQuote)), '"');
In unusual cases, it can be handy to choose the next parser based
on information outside the grammar itself.
For example, to handle an #ifdef
style preprocessor, the parsing
proceeds differently depending on whether the #ifdef
is true or false,
and information outside the source text may be required to decide.
If you want to check a runtime dictionary to decide which parser to
use for the next tokens, than .toParser
is of use.
If the language you're parsing has some elements that can appear almost anywhere, it'd be awkward to mention those elements at every possible position in the grammar. Examples include nested block comments, comments containing semantic info, etc.
To handle pervasive elements, MiniParse offers an unusual feature called preparsing that allows you to stack parsers. First the pre-parser will run, and if it fails to match at the current position, then the main parser will run.
const p = preParse(blockComments, mainParser);
Multiple preparsers can be attached. Preparsing can also be temporarily disabled in the grammar, e.g. to disable comment skipping inside quotes.
Save preparsing for special situations. If the pervasive elements are easy to find and can be skipped, then adding a few token types to skip in the lexer is simpler and faster. That's typically the approach for white space.
There are two application specific objects that are passed to every parser:
state
and context
.
app.state
, as mentioned above, is handy for accumulating application
results of successful parses.
app.context
is useful to store ephemeral application state discovered
during parsing.
Like app.state
, app.context
is just for applications -
MiniParse doesn't use it
and applications can read and write it using the .map()
method on any parser.
But unlike app.state
MiniParse will reset app.context
when a sub-parser fails and backtracks.
app.context
is passed to child parsers, but doesn't accumulate to parent parsers.
An example of using app.context
is for parsing nested #ifdef
#endif
clauses.
app.context
is a good place to store
the stack of active/inactive states discovered while
parsing.
Left recursive rules are typically disallowed in top down parsers, including MiniParse. In the parser combinator setting, it's obvious why - a function calling itself in its first statement is going to recurse forever. Best to write the grammar so that recursion is in the middle or at the end. See the block comment example or the calculator example.
PEG parsers like MiniParse can be sped up using a memoization algorithm called packrat parsing.
Tratt describes a technique to allow some left recursive rules, based on Warth's proposal for left recursion with packrat parsing. Rossum also has pursued this approach for Python. But per Tratt, note that the resulting parse order is not as predictable, and there are issues with rules that are simultaneously left and right recursive.
Allowing a regex as a parser argument would be convenient to avoid the need for a separate lexer in some cases.
Is MiniParse right for your project? Consider the alternatives:
Full Custom Parser - maximum speed and ultimate malleability, lots of work.
For maximum speed and control, write a dedicated parser directly in Typescript. This is the most effort, but if you're writing a production compiler and need to squeeze every millisecond, it's worth it. Otherwise use a parser generator tool suite or a parser combinator library.
Parser Generator - high speed, some work to adopt.
Parser generators statically analyze and precompile a grammar description language. These mature tools can be a bit big, but there's lots of documentation, rich ecosystems of example code and support tools.
Worthy examples include: Nearley, Lezer, Antlr, Chevrotain, Tree-sitter, or Ohm. Each parser generator has its own textual format to describe the grammar. The library compiles into an execution format before parsing. Each of the above libraries uses a different base algorithm (Earley, GLR, LL, Packrat) with different tradeoffs, but all have evolved robust features to classic parsing problems of error recovery, left recursion, producing parse results, tracing, etc.
Parser generators are typically more complicated to adopt than parser combinator libraries, less flexible, require a separate build step, and they're more code than typical parser combinators. But for demanding parsing jobs, the complexity of a parser generator tool is easily worth the investment.
Parser Combinators - lower speed, most flexibility, lightweight adoption.
Parser combinators define a grammar by mixing simple TypeScript functions provided by the library or written by the user (aka combinator functions). Execution of the grammar normally means simply running these functions. The simplicity makes parser combinators flexible and easy to adopt - you're using TypesScript for everything.
Parser combinators typically interpret rather than compiling the grammar in advance, so they're slower to run. But they're plenty fast enough for most purposes.
In the Parser Combinator category, MiniParse has a few interesting features and is notably lightweight. As you're parser shopping, also consider other worthy and more mature libraries in the TypeScript parser combinator category like: ts-parsec and ParJS.
FAQs
**MiniParse** is a small TypeScript parser combinator library with an efficient regex based lexer.
We found that mini-parse demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.