Security News
Node.js EOL Versions CVE Dubbed the "Worst CVE of the Year" by Security Experts
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
bnf-parser
Advanced tools
BNF-Parser is a simple library for generating syntax parsers based on deterministic BNF syntax descriptions. It includes a few changes from standard BNF forms to help produce cleaner syntax tree outputs.
First, provide the BNF representation of your language and parse it into a syntax tree. Then, compile the tree into a representation that is ready to parse syntax trees for the compiled language.
import { BNF, Compile } from "bnf-parser";
let result = BNF.parse(LANGUAGE_BNF);
let tree = Compile(result);
let syntax = tree.parse(FILE);
You can save a compiled BNF as a JSON file and reload it later:
// Save the syntax tree
fs.writeFileSync(path, JSON.stringify(tree.serialize()));
// Load the compiled syntax tree
let tree = new Parser(
JSON.parse( fs.readFileSync(path, 'utf8') )
);
program ::= %w* ( def %w* )+ ;
# Consumes a single wild character
any ::= !"" ;
# Whitespace
w ::= comment | " " | "\t" | "\n" | "\r" ;
comment ::= "#" !"\n"* "\n" ;
name ::= ...( letter | digit | "_" )+ ;
letter ::= "a"->"z" | "A"->"Z" ;
digit ::= "0"->"9" ;
constant ::= single | double ;
double ::= %"\"" ( ( "\\" ...any ) | !"\""+ )* %"\"" ;
single ::= %"\'" ( ( "\\" ...any ) | !"\'"+ )* %"\'" ;
def ::= ...name %w+ %"::=" %w* expr %w* %";" ;
expr ::= expr_arg %w* ( ...expr_infix? %w* expr_arg %w* )* ;
expr_arg ::= expr_prefix ( constant | expr_brackets | ...name ) ...expr_suffix? ;
expr_prefix ::= "%"? "..."? "!"? ;
expr_infix ::= "->" | "|" ;
expr_suffix ::= "*" | "?" | "+" ;
expr_brackets ::= %"(" %w* expr %w* %")" ;
Code | Result |
---|---|
\b | Backspace |
\f | Form Feed |
\n | New Line |
\r | Carriage Return |
\t | Horizontal Tab |
\v | Vertical Tab |
- | Unrecognised escapes will result in just the character after the slash |
?
, +
, *
Only one repetition mark should exist per argument.
term # once
term? # one or zero
term+ # at least once
term* # zero or more
%
%term
The omit operator is placed in front of a single term and must be the front-most operator, preceding any not
or gather
operators. It causes the syntax under this operator to be removed from the final syntax tree but still remain part of syntax validation.
!
!term
The not operator is used to consume a single token as long as the term it precedes does not match. It can also be used with repetition markers to consume multiple tokens that do not match the specified term.
!"a" # Matches any single character except "a"
!"a"* # Matches any sequence of characters excluding "a"
!"a"+ # Matches any sequence of characters with at least one character, excluding "a"
This can be very powerful when used in conjunction with other operations such as select
non_vowel ::= !(
"a" | "e" | "i" | "o" | "u" |
"A" | "E" | "I" | "O" | "U"
) ; # this will match any non-vowel, which includes non-letter characters
->
term -> term
The range operator is used to define a range between two single-length constants, allowing the parser to match any character within the specified range. This is useful for simplifying character ranges in BNF descriptions.
"a"->"z" # Matches a single character within the range "a" to "z"
"a"->"z"* # Matches a sequence of characters within the range "a" to "z"
The range operator can be combined with repetition markers to control the number of characters consumed within the specified range.
This is a pre-initialised BNF parser, which can be given a BNF string input.
const BNF: Parser;
The Parser
class is used to parse input based on the syntax tree generated by the BNF
class.
class Parser {
constructor(blob: any)
// Attempts to parse a language into a syntax tree
parse(
input : string, // The text to be parsed
partial : boolean = false, // Whether the entire string needs to be consucanmed
entry : string = "program" // Where parsing should start from in the BNF definition
): SyntaxNode | ParseError
setVerbose(mode: boolean) { }
}
The Compile
function is given a SyntaxNode
tree generated from the pre-initialized BNF
parser - from which it can generate a new parser.
function Compile(tree: SyntaxNode): Parser
The SyntaxNode
class represents a node in the generated syntax tree.
The type value of a SyntaxNode
is typically the name of the term being matched - i.e. the root node will be program
by default.
However it can be a generated name in the case of brackets (...)
and repetition markers such as +
(...)+
.
class SyntaxNode {
type : string;
value : SyntaxValue;
ref : ReferenceRange;
constructor(type: string, value: SyntaxValue, ref: ReferenceRange) {};
// Merges all of it's child syntax node values into a single string
flat(): string {};
}
type SyntaxValue = SyntaxNode[] | string;
The ParseError
class represents an error that occurred during parsing of the input.
class ParseError {
stack : string[]
msg : string
ref : ReferenceRange
constructor(msg: string, ref: ReferenceRange) { }
// Adds a string to the top of the call stack
// (for internal use)
add_stack(elm: string) { }
// If this error contains a pass stack
// (for internal use)
hasStack(): boolean { }
// Stringifies itself for printing/debug
toString() { }
}
The Reference
class is a cursor to describe a certain point in a string input.
class Reference {
index : number;
line : number;
col : number;
constructor(line: number, col: number, index: number) { }
// Will shift the reference one position forwards
advance(newline: boolean = false) { }
// Returns a deep copy of itself
clone(): Reference { }
// Stringifies itself for printing/debug
toString(): string { }
}
The ReferenceRange
class uses two Reference
s to describe a range within the text input.
class ReferenceRange {
start : Reference;
end : Reference;
constructor(from: Reference, to: Reference) { }
// Alters itself so the rang supplied now fits within the range of itself
// Basically takes the min from, and the max to references and applies them to itself
span(other: ReferenceRange) { }
// Returns a deep copy of itself
clone(): ReferenceRange { }
// Stringifies itself for printing/debug
toString(): string { }
}
There are two primary abstractions in generating syntax trees from BNF: sequences and selections.
A sequence is an ordered list of elements that form a match. A top-level sequence (right side of the ::=
) will resolve with the SyntaxNode.type
of the matching name (the name on the left of the ::=
). Sub-sequences within parentheses ()
will appear as a syntax node with the name (...)
and will be evaluated similarly to the top level.
If a repetition marker like name+ is used, additional nodes with the type (...)+
will be added, and their children will represent the number of times the pattern matched.
A selection will resolve as the syntax tree of the first matching option. For example, if you have the selection statement variable | number
, and the parser matches a variable
, it would be the same as having a variable
in that position in the sequence.
The selection statement will always consume the first valid option, so you should order your selection statements accordingly. For example: i.e.
program ::= "a" | "aa" ;
In this case, providing the input "aa" will fail, as it will consume the single "a", and since there is no repetition, the program will end, leaving the second "a" unconsumed. As the syntax did not parse the whole string, this is considered an error. See Parser for information on allowing partial matches.
Omit statements within a sequence will be removed, and they will not be present in the output syntax tree. However, they are still crucial for a successful match. If they are within a selection, they will be visible with SyntaxNode.type
of omit
and no child nodes.
Gather does not change the output syntax tree structure relative to the sequence or selection it is within. However, it combines all of its child nodes into a single string. Note that this reflects the effects of any omit operations within the child nodes.
The SyntaxNode.values
will be a single string containing all characters consumed until it matched the target expression of the repetition limit is reached.
Ranges will appear with the SyntaxNode.type
of range
with SyntaxNode.value
being a single string with the characters consumed by this expression, accounting for any repetition markers (so a range with +
will be a string of length at least one).
A repetition marker creates its own node in the syntax tree, with its children representing the value of each repetition. The SyntaxNode.type
value of this node will be the (...)
followed by the repetition marker used, such as: (...)+
, (...)*
, or (...)?
.
Literals will appear with the SyntaxNode.type
of literal
and SyntaxNode.value
as an exact copy of the literal as a string.
FAQs
Deterministic BNF compiler/parser
The npm package bnf-parser receives a total of 10,143 weekly downloads. As such, bnf-parser popularity was classified as popular.
We found that bnf-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.