bnf-parser
Advanced tools
Comparing version 3.1.3 to 3.1.4
@@ -70,2 +70,5 @@ "use strict"; | ||
} | ||
if (this.value.length == 0) { | ||
return null; | ||
} | ||
return this.value[this.value.length - 1].getReach(); | ||
@@ -72,0 +75,0 @@ } |
# Changelog | ||
## Version 3.1.4 | ||
### Tweaks: | ||
- [x] Added extra test case `Sequalize` | ||
- [x] Clarified readme documentation further | ||
- [x] Synced up some documentation inconsistencies between NPM and Github | ||
- [x] Fixed typo in previous version number's changelog | ||
- [x] Got ChatGPT-4 to read my homework (read over the readme and helped clarify a few things) | ||
## Version 3.1.3 | ||
### Fixes: | ||
- [x] Fixed missing bound check on getting reach of a SyntaxNode | ||
### Tweaks: | ||
- [x] Clarified readme documentation | ||
## Version 3.1.2 | ||
@@ -4,0 +21,0 @@ ### Fixes: |
{ | ||
"name": "bnf-parser", | ||
"version": "3.1.3", | ||
"version": "3.1.4", | ||
"description": "Deterministic BNF compiler/parser", | ||
"main": "./bin/index.js", | ||
"scripts": { | ||
"test": "node ./test", | ||
"test": "npm run build & node ./test", | ||
"build": "run-s build:*", | ||
@@ -9,0 +9,0 @@ "build:ts": "tsc", |
177
readme.md
# BNF-Parser <!-- no toc --> | ||
[![Test via NPM](https://github.com/AjaniBilby/BNF-parser/actions/workflows/npm-load-check.yml/badge.svg?branch=master)](https://github.com/AjaniBilby/BNF-parser/actions/workflows/npm-load-check.yml) | ||
[![Test](https://github.com/AjaniBilby/BNF-parser/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/AjaniBilby/BNF-parser/actions/workflows/test.yml) | ||
BNF-Parser is a simple library for generating syntax parsers based on deterministic BNF syntax descriptions. It includes a few changes from standard BNF forms to help produce cleaner syntax tree outputs. | ||
- [BNF-Parser ](#bnf-parser-) | ||
@@ -10,6 +13,6 @@ - [Example](#example) | ||
- [Escape Codes](#escape-codes) | ||
- [Repetition `?`, `+`, `*`](#repetition---) | ||
- [Omit `%`](#omit-) | ||
- [Not `!`](#not-) | ||
- [Range `->`](#range--) | ||
- [Repetition Operator `?`, `+`, `*`](#repetition-operator---) | ||
- [Omit Operator `%`](#omit-operator-) | ||
- [Not Operator `!`](#not-operator-) | ||
- [Range Operator `->`](#range-operator--) | ||
- [Imports](#imports) | ||
@@ -30,13 +33,11 @@ - [BNF](#bnf) | ||
- [Range](#range) | ||
- [Repetition](#repetition) | ||
- [Literal](#literal) | ||
A simple library for generate syntax pasers based BNF syntax descriptions. | ||
There are a few changes from standard BNF forms to help produce cleaner syntax tree outputs that BNFs normally provide. | ||
# Example | ||
First of all provide the BNF representation of you language, and parse that into a syntax tree. This tree can then be compiled down into a representation ready to parse syntax trees for the compiled language. | ||
First, provide the BNF representation of your language and parse it into a syntax tree. Then, compile the tree into a representation that is ready to parse syntax trees for the compiled language. | ||
```ts | ||
import { BNF, Parse, Compile } from "bnf-parser"; | ||
import { BNF, Compile } from "bnf-parser"; | ||
@@ -49,4 +50,3 @@ let result = BNF.parse(LANGUAGE_BNF); | ||
A compiled BNF can be saved as a JSON file and reloaded later | ||
You can save a compiled BNF as a JSON file and reload it later: | ||
```ts | ||
@@ -81,4 +81,4 @@ // Save the syntax tree | ||
constant ::= single | double ; | ||
double ::= %"\"" ( ( "\\" ...any ) | !"\""+ )* %"\"" ; | ||
single ::= %"\'" ( ( "\\" ...any ) | !"\'"+ )* %"\'" ; | ||
double ::= %"\"" ( ( "\\" ...any ) | !"\""+ )* %"\"" ; | ||
single ::= %"\'" ( ( "\\" ...any ) | !"\'"+ )* %"\'" ; | ||
@@ -107,3 +107,3 @@ def ::= ...name %w+ %"::=" %w* expr %w* %";" ; | ||
### Repetition `?`, `+`, `*` | ||
### Repetition Operator `?`, `+`, `*` | ||
@@ -118,4 +118,6 @@ Only one repetition mark should exist per argument. | ||
### Omit `%` | ||
[Resulting syntax layout](#repetition) | ||
### Omit Operator `%` | ||
```bnf | ||
@@ -125,7 +127,7 @@ %term | ||
This operator will lead to the syntax under this operator being removed from the final syntax tree, however still remain as part of syntax validation. For instance in the BNF syntax above... | ||
The omit operator is placed in front of a single term and must be the front-most operator, preceding any `not` or `gather` operators. It causes the syntax under this operator to be removed from the final syntax tree but still remain part of syntax validation. | ||
The omit character goes in front af a single term, and must be the front most operator placing it in from of any `not` or `gather` operators. | ||
[Resulting syntax layout](#omit) | ||
### Not `!` | ||
### Not Operator `!` | ||
@@ -136,14 +138,37 @@ ```bnf | ||
This operator must be between two single length constants, this will accept all characters within the range of the two bounds (inclusive). | ||
The not operator is used to consume a single token as long as the term it precedes does not match. It can also be used with repetition markers to consume multiple tokens that do not match the specified term. | ||
### Range `->` | ||
```bnf | ||
!"a" # Matches any single character except "a" | ||
!"a"* # Matches any sequence of characters excluding "a" | ||
!"a"+ # Matches any sequence of characters with at least one character, excluding "a" | ||
``` | ||
This can be very powerful when used in conjunction with other operations such as select | ||
```bnf | ||
"a"->"z" # will consume a single character | ||
"a"->"z"* # will consume as many characters as are in the range | ||
non_vowel ::= !( | ||
"a" | "e" | "i" | "o" | "u" | | ||
"A" | "E" | "I" | "O" | "U" | ||
) ; # this will match any non-vowel, which includes non-letter characters | ||
``` | ||
This operator must be between two single length constants, this will accept all characters within the range of the two bounds (inclusive). Until the repetition count is reached. | ||
The first operand must have no repetitions, however the repetition markers on the last operand will apply to the whole group. | ||
[Resulting syntax layout](#not) | ||
### Range Operator `->` | ||
```bnf | ||
term -> term | ||
``` | ||
The range operator is used to define a range between two single-length constants, allowing the parser to match any character within the specified range. This is useful for simplifying character ranges in BNF descriptions. | ||
```bnf | ||
"a"->"z" # Matches a single character within the range "a" to "z" | ||
"a"->"z"* # Matches a sequence of characters within the range "a" to "z" | ||
``` | ||
The range operator can be combined with repetition markers to control the number of characters consumed within the specified range. | ||
[Resulting syntax layout](#range) | ||
## Imports | ||
@@ -161,3 +186,3 @@ | ||
Is initialised with a built syntax tree. Once initialized it can be given input strings to generate output syntax trees for a given language. | ||
The `Parser` class is used to parse input based on the syntax tree generated by the `BNF` class. | ||
@@ -170,5 +195,5 @@ ```ts | ||
parse( | ||
input: string, // The text to be parsed | ||
partial = false, // Whether the entire string needs to be consucanmed | ||
entry = "program" // Where parsing should start from in the BNF definition | ||
input : string, // The text to be parsed | ||
partial : boolean = false, // Whether the entire string needs to be consucanmed | ||
entry : string = "program" // Where parsing should start from in the BNF definition | ||
): SyntaxNode | ParseError | ||
@@ -181,3 +206,3 @@ setVerbose(mode: boolean) { } | ||
Given a `SyntaxNode` tree generated from the BNF pre-initialized parser it can generate a new parser. | ||
The `Compile` function is given a `SyntaxNode` tree generated from the pre-initialized `BNF` parser - from which it can generate a new parser. | ||
@@ -188,41 +213,14 @@ ```ts | ||
```ts | ||
class Reference { | ||
### SyntaxNode | ||
// Returns a deep copy of itself | ||
clone(): Reference | ||
The `SyntaxNode` class represents a node in the generated syntax tree. | ||
// Stringifies itself for printing/debug | ||
toString(): string | ||
} | ||
``` | ||
The type value of a `SyntaxNode` is typically the name of the term being matched - i.e. the root node will be `program` by default. | ||
However it can be a generated name in the case of brackets `(...)` and repetition markers such as `+` `(...)+`. | ||
```ts | ||
class ReferenceRange { | ||
constructor(from: Reference, to: Reference) | ||
// Returns a deep copy of itself | ||
clone(): ReferenceRange | ||
// Stringifies itself for printing/debug | ||
toString(): string | ||
} | ||
``` | ||
```ts | ||
class ParseError { | ||
constructor(msg: string, ref: ReferenceRange) | ||
// Stringifies itself for printing/debug | ||
toString(): string | ||
} | ||
``` | ||
### SyntaxNode | ||
```ts | ||
class SyntaxNode { | ||
type: string; | ||
value: SyntaxValue; | ||
ref: ReferenceRange; | ||
type : string; | ||
value : SyntaxValue; | ||
ref : ReferenceRange; | ||
@@ -234,2 +232,4 @@ constructor(type: string, value: SyntaxValue, ref: ReferenceRange) {}; | ||
} | ||
type SyntaxValue = SyntaxNode[] | string; | ||
``` | ||
@@ -239,7 +239,9 @@ | ||
The `ParseError` class represents an error that occurred during parsing of the input. | ||
```ts | ||
class ParseError { | ||
stack: string[] | ||
msg: string | ||
ref: ReferenceRange | ||
stack : string[] | ||
msg : string | ||
ref : ReferenceRange | ||
@@ -263,7 +265,9 @@ constructor(msg: string, ref: ReferenceRange) { } | ||
The `Reference` class is a cursor to describe a certain point in a string input. | ||
```ts | ||
class Reference { | ||
line: number; | ||
col: number; | ||
index: number; | ||
index : number; | ||
line : number; | ||
col : number; | ||
@@ -285,6 +289,8 @@ constructor(line: number, col: number, index: number) { } | ||
The `ReferenceRange` class uses two `Reference`s to describe a range within the text input. | ||
```ts | ||
class ReferenceRange { | ||
start: Reference; | ||
end: Reference; | ||
start : Reference; | ||
end : Reference; | ||
@@ -307,15 +313,15 @@ constructor(from: Reference, to: Reference) { } | ||
There are two main core abstractions for how the syntax trees are generated from a BNF, sequences and selects. | ||
There are two primary abstractions in generating syntax trees from BNF: sequences and selections. | ||
## Sequences | ||
A sequence is a linear list of elements that make up a match. A top level sequence (right side of the `::=`) will resolve with the `.type` of the matching name (the name on the left of the `::=`), any sub-sequences `()` will appear as a syntax node with the name `(...)` with subsequent values being evaluated the same as the top level. | ||
A sequence is an ordered list of elements that form a match. A top-level sequence (right side of the `::=`) will resolve with the `SyntaxNode.type` of the matching name (the name on the left of the `::=`). Sub-sequences within parentheses `()` will appear as a syntax node with the name `(...)` and will be evaluated similarly to the top level. | ||
If there is a repetition marker such as `name+` there will be an extra noded added with the type `(...)+` of whom's children will be the number of times the pattern was matched. | ||
If a repetition marker like name+ is used, additional nodes with the type `(...)+` will be added, and their children will represent the number of times the pattern matched. | ||
## Select | ||
Will resolve as the syntax tree of the first matching option. For instance if you have the select statement `variable | number`, if the parser matches a variable it would be the same as having a `variable` at that point in the sequence. | ||
A selection will resolve as the syntax tree of the first matching option. For example, if you have the selection statement `variable | number`, and the parser matches a `variable`, it would be the same as having a `variable` in that position in the sequence. | ||
> The select statement will always consume the first valid option, and your should order your select statements accordingly | ||
> The selection statement will always consume the first valid option, so you should order your selection statements accordingly. For example: | ||
> i.e. | ||
@@ -325,22 +331,27 @@ > ```bnf | ||
> ``` | ||
> Giving this program "aa" will fail, as it will consume the single "a", then since there is no repetition the program will end leaving the second "a" unconsumed. As the syntax did not parse the whole string, this is considered an error. | ||
> See [Parser](#parser) for information on how to allow partial matches | ||
> In this case, providing the input "aa" will fail, as it will consume the single "a", and since there is no repetition, the program will end, leaving the second "a" unconsumed. As the syntax did not parse the whole string, this is considered an error. | ||
> See [Parser](#parser) for information on allowing partial matches. | ||
## Omit | ||
Any omit statement within a sequence will be removed, and then looking at the outputted syntax tree it is like they never existed, however they are still critical to a successful match. In the case that they are within a select, they will still be visible with `.type` of `omit`, with no child nodes. | ||
Omit statements within a sequence will be removed, and they will not be present in the output syntax tree. However, they are still crucial for a successful match. If they are within a selection, they will be visible with `SyntaxNode.type` of `omit` and no child nodes. | ||
## Gather | ||
This does not alter the outputted syntax tree form in relation to the sequence or select it is within, however it will squash all of it's child nodes back down into a single string. Node that this will reflect the affects of any omit operations which occurred within the child nodes. | ||
Gather does not change the output syntax tree structure relative to the sequence or selection it is within. However, it combines all of its child nodes into a single string. Note that this reflects the effects of any omit operations within the child nodes. | ||
## Not | ||
It's `.values` will be a single string of all characters it could consume until it matched with the target expression. | ||
The `SyntaxNode.values` will be a single string containing all characters consumed until it matched the target expression of the repetition limit is reached. | ||
## Range | ||
Ranges will appear with the `.type` of `range` with `.value` being a single string with the characters consumed by this expression, inclusing any repetition markers (so a range with `+` will be a string of length at least one). | ||
Ranges will appear with the `SyntaxNode.type` of `range` with `SyntaxNode.value` being a single string with the characters consumed by this expression, accounting for any repetition markers (so a range with `+` will be a string of length at least one). | ||
## Repetition | ||
A repetition marker creates its own node in the syntax tree, with its children representing the value of each repetition. The `SyntaxNode.type` value of this node will be the `(...)` followed by the repetition marker used, such as: `(...)+`, `(...)*`, or `(...)?`. | ||
## Literal | ||
Ranges will appear with the `.type` of `literal` with `.value` being a copy of the exact literal as a string. | ||
Literals will appear with the `SyntaxNode.type` of `literal` and `SyntaxNode.value` as an exact copy of the literal as a string. |
46680
339
14
892