New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

bnf-parser

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

bnf-parser

Deterministic BNF compiler/parser

3.1.5
Source
npm

Version published: 2 years ago

Weekly downloads: 11K; decreased by-11.97%

Maintainers: 1

Weekly downloads

Created: 5 years ago

Source

BNF-Parser

BNF-Parser is a simple library for generating syntax parsers based on deterministic BNF syntax descriptions. It includes a few changes from standard BNF forms to help produce cleaner syntax tree outputs.

Example

First, provide the BNF representation of your language and parse it into a syntax tree. Then, compile the tree into a representation that is ready to parse syntax trees for the compiled language.

import { BNF, Compile } from "bnf-parser";

let result = BNF.parse(LANGUAGE_BNF);
let tree = Compile(result);

let syntax = tree.parse(FILE);

You can save a compiled BNF as a JSON file and reload it later:

// Save the syntax tree
fs.writeFileSync(path, JSON.stringify(tree.serialize()));

// Load the compiled syntax tree
let tree = new Parser(
  JSON.parse( fs.readFileSync(path, 'utf8') )
);

API

BNF Syntax

program ::= %w* ( def %w* )+ ;

# Consumes a single wild character
any ::= !"" ;

# Whitespace
w ::= comment | " " | "\t" | "\n" | "\r" ;
	comment ::= "#" !"\n"* "\n" ;

name ::= ...( letter | digit | "_" )+ ;
	letter ::= "a"->"z" | "A"->"Z" ;
	digit ::= "0"->"9" ;

constant ::= single | double ;
	double ::= %"\"" ( ( "\\" ...any ) | !"\""+ )* %"\"" ;
	single ::= %"\'" ( ( "\\" ...any ) | !"\'"+ )* %"\'" ;

def ::= ...name %w+ %"::=" %w* expr %w* %";" ;

expr ::= expr_arg %w* ( ...expr_infix? %w* expr_arg %w* )* ;
	expr_arg ::= expr_prefix ( constant | expr_brackets | ...name ) ...expr_suffix? ;
	expr_prefix ::= "%"? "..."? "!"? ;
	expr_infix  ::= "->" | "|" ;
	expr_suffix ::= "*" | "?" | "+" ;
	expr_brackets ::= %"(" %w* expr %w* %")" ;

Escape Codes

Code	Result
`\b`	Backspace
`\f`	Form Feed
`\n`	New Line
`\r`	Carriage Return
`\t`	Horizontal Tab
`\v`	Vertical Tab
-	Unrecognised escapes will result in just the character after the slash

Repetition Operator `?`, `+`, `*`

Only one repetition mark should exist per argument.

term # once
term? # one or zero
term+ # at least once
term* # zero or more

Resulting syntax layout

Omit Operator `%`

%term

The omit operator is placed in front of a single term and must be the front-most operator, preceding any not or gather operators. It causes the syntax under this operator to be removed from the final syntax tree but still remain part of syntax validation.

Resulting syntax layout

Not Operator `!`

!term

The not operator is used to consume a single token as long as the term it precedes does not match. It can also be used with repetition markers to consume multiple tokens that do not match the specified term.

!"a"   # Matches any single character except "a"
!"a"*  # Matches any sequence of characters excluding "a"
!"a"+  # Matches any sequence of characters with at least one character, excluding "a"

This can be very powerful when used in conjunction with other operations such as select

non_vowel ::= !(
  "a" | "e" | "i" | "o" | "u" |
  "A" | "E" | "I" | "O" | "U"
) ; # this will match any non-vowel, which includes non-letter characters

Resulting syntax layout

Range Operator `->`

term -> term

The range operator is used to define a range between two single-length constants, allowing the parser to match any character within the specified range. This is useful for simplifying character ranges in BNF descriptions.

"a"->"z"  # Matches a single character within the range "a" to "z"
"a"->"z"* # Matches a sequence of characters within the range "a" to "z"

The range operator can be combined with repetition markers to control the number of characters consumed within the specified range.

Resulting syntax layout

Imports

BNF

This is a pre-initialised BNF parser, which can be given a BNF string input.

const BNF: Parser;

Parser

The Parser class is used to parse input based on the syntax tree generated by the BNF class.

class Parser {
  constructor(blob: any)

  // Attempts to parse a language into a syntax tree
  parse(
    input   : string,             // The text to be parsed
    partial : boolean = false,    // Whether the entire string needs to be consucanmed
    entry   : string  = "program" // Where parsing should start from in the BNF definition
  ): SyntaxNode | ParseError
  setVerbose(mode: boolean) { }
}

Compile

The Compile function is given a SyntaxNode tree generated from the pre-initialized BNF parser - from which it can generate a new parser.

function Compile(tree: SyntaxNode): Parser

SyntaxNode

The SyntaxNode class represents a node in the generated syntax tree.

The type value of a SyntaxNode is typically the name of the term being matched - i.e. the root node will be program by default. However it can be a generated name in the case of brackets (...) and repetition markers such as + (...)+.

class SyntaxNode {
  type  : string;
  value : SyntaxValue;
  ref   : ReferenceRange;

  constructor(type: string, value: SyntaxValue, ref: ReferenceRange) {};

  // Merges all of it's child syntax node values into a single string
  flat(): string {};
}

type SyntaxValue = SyntaxNode[] | string;

ParseError

The ParseError class represents an error that occurred during parsing of the input.

class ParseError {
  stack : string[]
  msg   : string
  ref   : ReferenceRange

  constructor(msg: string, ref: ReferenceRange) { }

  // Adds a string to the top of the call stack
  //   (for internal use)
  add_stack(elm: string) { }

  // If this error contains a pass stack
  //   (for internal use)
  hasStack(): boolean { }

  // Stringifies itself for printing/debug
  toString() { }
}

Reference

The Reference class is a cursor to describe a certain point in a string input.

class Reference {
  index : number;
  line  : number;
  col   : number;

  constructor(line: number, col: number, index: number) { }

  // Will shift the reference one position forwards
  advance(newline: boolean = false) { }

  // Returns a deep copy of itself
  clone(): Reference { }

  // Stringifies itself for printing/debug
  toString(): string { }
}

Reference Range

The ReferenceRange class uses two References to describe a range within the text input.

class ReferenceRange {
  start : Reference;
  end   : Reference;

  constructor(from: Reference, to: Reference) { }

  // Alters itself so the rang supplied now fits within the range of itself
  //  Basically takes the min from, and the max to references and applies them to itself
  span(other: ReferenceRange) { }

  // Returns a deep copy of itself
  clone(): ReferenceRange { }

  // Stringifies itself for printing/debug
  toString(): string { }
}

Syntax Tree

There are two primary abstractions in generating syntax trees from BNF: sequences and selections.

Sequences

A sequence is an ordered list of elements that form a match. A top-level sequence (right side of the ::=) will resolve with the SyntaxNode.type of the matching name (the name on the left of the ::=). Sub-sequences within parentheses () will appear as a syntax node with the name (...) and will be evaluated similarly to the top level.

If a repetition marker like name+ is used, additional nodes with the type (...)+ will be added, and their children will represent the number of times the pattern matched.

Select

A selection will resolve as the syntax tree of the first matching option. For example, if you have the selection statement variable | number, and the parser matches a variable, it would be the same as having a variable in that position in the sequence.

The selection statement will always consume the first valid option, so you should order your selection statements accordingly. For example: i.e.
program ::= "a" | "aa" ;
In this case, providing the input "aa" will fail, as it will consume the single "a", and since there is no repetition, the program will end, leaving the second "a" unconsumed. As the syntax did not parse the whole string, this is considered an error. See Parser for information on allowing partial matches.

Omit

Omit statements within a sequence will be removed, and they will not be present in the output syntax tree. However, they are still crucial for a successful match. If they are within a selection, they will be visible with SyntaxNode.type of omit and no child nodes.

Gather

Gather does not change the output syntax tree structure relative to the sequence or selection it is within. However, it combines all of its child nodes into a single string. Note that this reflects the effects of any omit operations within the child nodes.

Not

The SyntaxNode.values will be a single string containing all characters consumed until it matched the target expression of the repetition limit is reached.

Range

Ranges will appear with the SyntaxNode.type of range with SyntaxNode.value being a single string with the characters consumed by this expression, accounting for any repetition markers (so a range with + will be a string of length at least one).

Repetition

A repetition marker creates its own node in the syntax tree, with its children representing the value of each repetition. The SyntaxNode.type value of this node will be the (...) followed by the repetition marker used, such as: (...)+, (...)*, or (...)?.

Literal

Literals will appear with the SyntaxNode.type of literal and SyntaxNode.value as an exact copy of the literal as a string.

Keywords

FAQs

What is bnf-parser?

Is bnf-parser popular?

Is bnf-parser well maintained?

Package last updated on 23 Jun 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

bnf-parser

BNF-Parser

Example

API

BNF Syntax

Escape Codes

Repetition Operator ?, +, *

Omit Operator %

Not Operator !

Range Operator ->

Imports

BNF

Parser

Compile

SyntaxNode

ParseError

Reference

Reference Range

Syntax Tree

Sequences

Select

Omit

Gather

Not

Range

Repetition

Literal

Keywords

Related posts

cURL Project and Go Security Teams Reject CVSS as Broken

Bun 1.2 Released with Improved Node.js Compatibility and Built-in S3 Object Support

Repetition Operator `?`, `+`, `*`

Omit Operator `%`

Not Operator `!`

Range Operator `->`