Comparing version 2.10.3 to 2.10.4
# Accessing the internal parse table | ||
The `Parser` constructor takes an optional last parameter, `options`, | ||
which is an object with the following possible keys: | ||
If you are familiar with the Earley parsing algorithm, you can access the | ||
internal parse table using `Parser.table` (this, for example, is how | ||
`nearley-test` works). One caveat, however: you must pass the `keepHistory` | ||
option to nearley to prevent it from garbage-collecting inaccessible columns of | ||
the table. | ||
- `keepHistory` (boolean, default `false`) - whether to preserve and expose the internal state | ||
- `lexer` (object) - custom lexer, overrides `@lexer` in the grammar | ||
If you are familiar with the Earley parsing algorithm and are planning to do something exciting with the parse table, set `keepHistory`: | ||
```js | ||
@@ -15,8 +13,10 @@ const nearley = require("nearley"); | ||
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { keepHistory: true }); | ||
const parser = new nearley.Parser( | ||
nearley.Grammar.fromCompiled(grammar), | ||
{ keepHistory: true } | ||
); | ||
// ... | ||
// After feeding data: | ||
parser.feed(...); | ||
console.log(parser.table); | ||
``` |
# Custom tokens and lexers | ||
## Adding custom token matchers | ||
## Custom token matchers | ||
Sometimes you might want a more flexible way of matching tokens, whether you're using `@lexer` or not. | ||
Aside from the lexer infrastructure, nearley provides a lightweight way to | ||
parse arbitrary streams. | ||
Custom matchers can be defined in two ways: literal tokens and testable tokens. A | ||
literal token matches exactly, while a testable token runs a function to test | ||
whether it is a match or not. | ||
Custom matchers can be defined in two ways: *literal* tokens and *testable* | ||
tokens. A literal token matches a JS value exactly (with `===`), while a | ||
testable token runs a predicate that tests whether or not the value matches. | ||
Note that in this case, you would feed a `Parser` instance an *array* of | ||
objects rather than a string! Here is a simple example: | ||
```coffeescript | ||
@@ -17,24 +21,22 @@ @{% | ||
# Matches ["print", 12] if the input is an array with those elements. | ||
main -> %tokenPrint %tokenNumber | ||
main -> %tokenPrint %tokenNumber ";;" | ||
# parser.feed(["print", 12, ";;"]); | ||
``` | ||
## Writing a custom lexer | ||
## Custom lexers | ||
If you don't want to use [Moo](https://github.com/tjvr/moo), our recommended lexer/tokenizer, you can define your own. Either pass it using `@lexer myLexer` in the grammar, or in options to `Parser`: | ||
nearley recommends using a [moo](https://github.com/tjvr/moo)-based lexer. | ||
However, you can use any lexer that conforms to the following interface: | ||
```js | ||
const nearley = require("nearley"); | ||
const grammar = require("./grammar"); | ||
const myLexer = require("./lexer"); | ||
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { lexer: myLexer }); | ||
``` | ||
You lexer must have the following interface: | ||
- `next() -> Token` return e.g. `{type, value, line, col, …}`. Only the `value` attribute is required. | ||
- `save() -> Info` -> return an object describing the current line/col etc. This allows us to preserve this information between `feed()` calls, and also to support `Parser#rewind()`. The exact structure is lexer-specific; nearley doesn't care what's in it. | ||
- `reset(chunk, Info)`: set the internal buffer to `chunk`, and restore line/col/state info taken from `save()`. | ||
- `formatError(token)` -> return a string with an error message describing the line/col of the offending token. You might like to include a preview of the line in question. | ||
- `has(tokenType)` -> return true if the lexer can emit tokens with that name. Used to resolve `%`-specifiers in compiled nearley grammars. | ||
- `next()` returns a token object, which could have fields for line number, | ||
etc. Importantly, a token object *must* have a `value` attribute. | ||
- `save()` returns an info object that describes the current state of the | ||
lexer. nearley places no restrictions on this object. | ||
- `reset(chunk, info)` sets the internal buffer of the lexer to `chunk`, and | ||
restores its state to a state returned by `save()`. | ||
- `formatError(token)` returns a string with an error message describing a | ||
parse error at that token (for example, the string might contain the line and | ||
column where the error was found). | ||
- `has(name)` returns true if the lexer can emit tokens with that name. This is | ||
used to resolve `%`-specifiers in compiled nearley grammars. |
@@ -19,4 +19,4 @@ Glossary | ||
**production rule**: a set of *strings* specified as a sequence of *symbols*, | ||
such that the rule matches a string if it is a concatenation of strings matched | ||
by the respective symbols | ||
such that the rule matches a string if it is a *concatenation* of strings | ||
matched by the respective symbols | ||
@@ -80,7 +80,9 @@ **symbol**: a generic term for a member of a *production rule*, either a | ||
**epsilon**: the empty production rule, matching only the empty string | ||
**epsilon**: the empty *production rule*, matching only the empty string | ||
**nullable rule**: a production rule that matches the empty string | ||
**nullable rule**: a *production rule* that matches the empty string, even | ||
though it is not necessarily equal to the *epsilon* rule (for example, the | ||
*concatenation* of epsilon with epsilon) | ||
**nearley**: a *parser* that parses *context-free languages*, along with | ||
several additional utilities for building languages |
@@ -1,9 +0,15 @@ | ||
# Using nearley in browsers | ||
# Using the nearley compiler in browsers | ||
Use a tool like [Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to include the `nearley` NPM package in your browser code. | ||
Both the nearley parser and compiled grammars work in browsers; simply include | ||
`nearley.js` and your compiled `grammar.js` file in `<script>` tags and use | ||
nearley as usual. However, the nearley *compiler* is not designed for the | ||
browser -- you should precompile your grammars and only serve the generated JS | ||
files to browsers. | ||
The runtime part works fine in browsers, but there's no concise way to compile a grammar and pass it to the `Parser` constructor. If you have a single static grammar, just precompile it with `nearleyc` and include the compiled JS file in your frontend code. | ||
If you absolutely have to compile a grammar in a browser (for example, to | ||
implement a nearley IDE) then you can use a tool like | ||
[Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to | ||
include the `nearley` NPM package in your browser code. Then, you can utilize | ||
the `nearleyc` internals to compile grammars dynamically. | ||
If you absolutely have to compile a grammar in a browser, e.g. the user enters it into a textarea, then here's an example for you: | ||
```js | ||
@@ -20,16 +26,18 @@ const nearley = require("nearley"); | ||
function compileGrammar(sourceCode) { | ||
// Oh boy, here we go. We're gonna do what `nearleyc` does. | ||
// Parse the custom grammar into AST as a nearley grammar. | ||
const grammarParser = new nearley.Parser(nearleyGrammar.ParserRules, nearleyGrammar.ParserStart); | ||
// Parse the grammar source into an AST | ||
const grammarParser = new nearley.Parser( | ||
nearleyGrammar.ParserRules, | ||
nearleyGrammar.ParserStart | ||
); | ||
grammarParser.feed(sourceCode); | ||
const grammarAst = grammarParser.results[0]; | ||
const grammarAst = grammarParser.results[0]; // TODO check for errors | ||
// Compile the custom grammar into JS. | ||
const grammarInfoObject = compile(grammarAst, {}); // Returns an object with rules, etc. | ||
const grammarJs = generate(grammarInfoObject, "grammar"); // Stringifies that object into JS. | ||
// Compile the AST into a set of rules | ||
const grammarInfoObject = compile(grammarAst, {}); | ||
// Generate JavaScript code from the rules | ||
const grammarJs = generate(grammarInfoObject, "grammar"); | ||
// `nearleyc` would save JS to a file and you'd require it, but in a browser we can only eval. | ||
const module = { exports: {} }; // Pretend this is a CommonJS environment to catch exports from the grammar. | ||
eval(grammarJs); // Evaluated code sees everything in the lexical scope, it can see `module`. | ||
// Pretend this is a CommonJS environment to catch exports from the grammar. | ||
const module = { exports: {} }; | ||
eval(grammarJs); | ||
@@ -36,0 +44,0 @@ return module.exports; |
{ | ||
"name": "nearley", | ||
"version": "2.10.3", | ||
"version": "2.10.4", | ||
"description": "Simple, fast, powerful parser toolkit for JavaScript.", | ||
@@ -5,0 +5,0 @@ "main": "lib/nearley.js", |
205
README.md
@@ -1,4 +0,2 @@ | ||
![](www/logo/nearley-purple.png) | ||
# [nearley](http://nearley.js.org) | ||
# [nearley](http://nearley.js.org) ↗️ | ||
[![JS.ORG](https://img.shields.io/badge/js.org-nearley-ffb400.svg?style=flat-square)](http://js.org) | ||
@@ -32,3 +30,3 @@ [![npm version](https://badge.fury.io/js/nearley.svg)](https://badge.fury.io/js/nearley) | ||
- [compilers for real programming languages](https://github.com/sizigi/lp5562); | ||
- and nearley itself! The nearley compiler is written in *itself*. | ||
- and nearley itself! The nearley compiler is bootstrapped. | ||
@@ -48,7 +46,6 @@ nearley is an npm [staff | ||
- [Getting started: nearley in 3 steps](#getting-started-nearley-in-3-steps) | ||
- [Writing a parser](#writing-a-parser) | ||
- [Terminals, nonterminals, rules](#terminals-nonterminals-rules) | ||
- [Writing a parser: the nearley grammar language](#writing-a-parser-the-nearley-grammar-language) | ||
- [Vocabulary](#vocabulary) | ||
- [Postprocessors](#postprocessors) | ||
- [Target languages](#target-languages) | ||
- [Catching errors](#catching-errors) | ||
- [More syntax: tips and tricks](#more-syntax-tips-and-tricks) | ||
@@ -62,3 +59,6 @@ - [Comments](#comments) | ||
- [Importing other grammars](#importing-other-grammars) | ||
- [Tokenizers](#tokenizers) | ||
- [Using a parser: the nearley API](#using-a-parser-the-nearley-api) | ||
- [A note on ambiguity](#a-note-on-ambiguity) | ||
- [Catching errors](#catching-errors) | ||
- [Tokenizers](#tokenizers) | ||
- [Tools](#tools) | ||
@@ -74,3 +74,3 @@ - [nearley-test: Exploring a parser interactively](#nearley-test-exploring-a-parser-interactively) | ||
- [Recipes](#recipes) | ||
- [Details](#details) | ||
- [Blog posts](#blog-posts) | ||
@@ -157,7 +157,10 @@ <!-- END doctoc generated TOC please keep comment here to allow auto update --> | ||
## Writing a parser | ||
## Writing a parser: the nearley grammar language | ||
Let's explore the building blocks of a nearley parser. | ||
This section describes the nearley grammar language, in which you can describe | ||
grammars for nearley to parse. Grammars are conventionally kept in `.ne` files. | ||
You can then use `nearleyc` to compile your `.ne` grammars to JavaScript | ||
modules. | ||
### Terminals, nonterminals, rules | ||
### Vocabulary | ||
@@ -170,14 +173,19 @@ - A *terminal* is a string or a token. For example, the keyword `"if"` is a | ||
- A *rule* (or production rule) is a definition of a nonterminal. For example, | ||
`"if" condition "then" statement "endif"` is the rule according to which the | ||
if statement nonterminal is parsed. | ||
`ifStatement -> "if" condition "then" statement "endif"` is the rule | ||
according to which the if statement nonterminal is parsed. | ||
The first nonterminal of the grammar is the one the whole input must match. | ||
With the following grammar, nearley will try to parse text as `expression`. | ||
By default, nearley attempts to parse the first nonterminal defined in the | ||
grammar. In the following grammar, nearley will try to parse input text as an | ||
`expression`. | ||
```js | ||
expression -> number "+" number | ||
expression -> number "-" number | ||
expression -> number "*" number | ||
expression -> number "/" number | ||
number -> [0-9]:+ | ||
``` | ||
Use the pipe character `|` to separate alternative rules for a nonterminal. | ||
You can use the pipe character `|` to separate alternative rules for a | ||
nonterminal. In the example below, `expression` has four different rules. | ||
@@ -197,9 +205,5 @@ ```js | ||
```js | ||
a -> null | ||
| a "cow" | ||
a -> null | a "cow" | ||
``` | ||
Keep in mind that nearley syntax is not sensitive to formatting. You're welcome | ||
to keep rules on the same line: `foo -> bar | qux`. | ||
### Postprocessors | ||
@@ -291,35 +295,2 @@ | ||
### Catching errors | ||
nearley is a *streaming* parser: you can keep feeding it more strings. This | ||
means that there are two error scenarios in nearley. | ||
Consider the simple parser below for the examples to follow. | ||
```js | ||
main -> "Cow goes moo." {% function(d) {return "yay!"; } %} | ||
``` | ||
If there are no possible parsings given the current input, but in the *future* | ||
there *might* be results if you feed it more strings, then nearley will | ||
temporarily set the `results` array to the empty array, `[]`. | ||
```js | ||
parser.feed("Cow "); // parser.results is [] | ||
parser.feed("goes "); // parser.results is [] | ||
parser.feed("moo."); // parser.results is ["yay!"] | ||
``` | ||
If there are no possible parsings, and there is no way to "recover" by feeding | ||
more data, then nearley will throw an error whose `offset` property is the | ||
index of the offending token. | ||
```js | ||
try { | ||
parser.feed("Cow goes% moo."); | ||
} catch(parseError) { | ||
console.log("Error at character " + parseError.offset); // "Error at character 9" | ||
} | ||
``` | ||
### More syntax: tips and tricks | ||
@@ -435,9 +406,101 @@ | ||
See the [`builtin/`](builtin) directory for more details. Contributions are | ||
welcome here! | ||
welcome! | ||
Including a file imports *all* of the nonterminals defined in it, as well as | ||
any JS, macros, and config options defined there. | ||
any JS, macros, and configuration options defined there. | ||
## Tokenizers | ||
## Using a parser: the nearley API | ||
Once you have compiled a `grammar.ne` file to a `grammar.js` module, you can | ||
then use nearley to instantiate a `Parser` object. | ||
First, import nearley and your grammar. | ||
```js | ||
const nearley = require("nearley"); | ||
const grammar = require("./grammar.js"); | ||
``` | ||
Note that if you are parsing in the browser, you can simply include | ||
`nearley.js` and `grammar.js` in `<script>` tags. | ||
Next, use the grammar to create a new `nearley.Parser` object. | ||
```js | ||
// Create a Parser object from our grammar. | ||
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar)); | ||
``` | ||
Once you have a `Parser`, you can `.feed` it a string to parse. Since nearley | ||
is a *streaming* parser, you can feed strings more than once. For example, a | ||
REPL might feed the parser lines of code as the user enters them: | ||
```js | ||
// Parse something! | ||
parser.feed("if (true) {"); | ||
parser.feed("x = 1"); | ||
parser.feed("}"); | ||
// or, parser.feed("if (true) {x=1}"); | ||
``` | ||
Finally, you can query the `.results` property of the parser. | ||
```js | ||
// parser.results is an array of possible parsings. | ||
console.log(parser.results); | ||
// [{'type': 'if', 'condition': ..., 'body': ...}] | ||
``` | ||
### A note on ambiguity | ||
Why is `parser.results` an array? Sometimes, a grammar can parse a particular | ||
string in multiple different ways. For example, the following grammar parses | ||
the string `"xyz"` in two different ways. | ||
```js | ||
x -> "xy" "z" | ||
| "x" "yz" | ||
``` | ||
Such grammars are *ambiguous*. nearley provides you with *all* the parsings. In | ||
most cases, however, your grammars should not be ambiguous (parsing ambiguous | ||
grammars is inefficient!). Thus, the most common usage is to simply query | ||
`parser.results[0]`. | ||
### Catching errors | ||
nearley is a *streaming* parser: you can keep feeding it more strings. This | ||
means that there are two error scenarios in nearley. | ||
Consider the simple parser below for the examples to follow. | ||
```js | ||
main -> "Cow goes moo." {% function(d) {return "yay!"; } %} | ||
``` | ||
If there are no possible parsings given the current input, but in the *future* | ||
there *might* be results if you feed it more strings, then nearley will | ||
temporarily set the `results` array to the empty array, `[]`. | ||
```js | ||
parser.feed("Cow "); // parser.results is [] | ||
parser.feed("goes "); // parser.results is [] | ||
parser.feed("moo."); // parser.results is ["yay!"] | ||
``` | ||
If there are no possible parsings, and there is no way to "recover" by feeding | ||
more data, then nearley will throw an error whose `offset` property is the | ||
index of the offending token. | ||
```js | ||
try { | ||
parser.feed("Cow goes% moo."); | ||
} catch(parseError) { | ||
console.log("Error at character " + parseError.offset); // "Error at character 9" | ||
} | ||
``` | ||
### Tokenizers | ||
By default, nearley splits the input into a stream of characters. This is | ||
@@ -580,2 +643,6 @@ called *scannerless* parsing. | ||
Node users can programmatically access the unparser using | ||
[nearley-there](https://github.com/stolksdorf/nearley-there) by Scott | ||
Tolksdorf. | ||
Browser users can use | ||
@@ -605,16 +672,9 @@ [nearley-playground](https://omrelli.ug/nearley-playground/) by Guillermo | ||
Tests live in `test/` and can be called with `npm test`. Please run the | ||
benchmarks before and after your changes: parsing is tricky, and small changes | ||
can kill efficiency. We learned this the hard way! | ||
Please read [this document](.github/CONTRIBUTING.md) *before* working on | ||
nearley. If you are interested in contributing but unsure where to start, take | ||
a look at the issues labeled "up for grabs" on the issue tracker, or message a | ||
maintainer. | ||
If you're looking for something to do, here's a short list of things that would | ||
make me happy: | ||
nearley is MIT licensed. | ||
- Optimize. There are still plenty of optimizations that an enterprising | ||
JS-savant could implement. | ||
- Help build the builtins library by PRing in your favorite primitives. | ||
- Solutions to issues labeled "up for grabs" on the issue tracker. | ||
Nearley is MIT licensed. | ||
A big thanks to Nathan Dinsmore for teaching me how to Earley, Aria Stewart for | ||
@@ -639,7 +699,5 @@ helping structure nearley into a mature module, and Robin Windels for | ||
- [Transforming parse trees](docs/generating-cst-ast.md) | ||
- [Writing an indentation-aware (Python-like) lexer](https://gist.github.com/nathan/d8d1adea38a1ef3a6d6a06552da641aa) | ||
- [Making a REPL for your language](docs/making-a-repl.md) | ||
### Details | ||
### Blog posts | ||
@@ -653,2 +711,1 @@ - Read my [blog post](http://hardmath123.github.io/earley.html) to learn more | ||
written by @gajus. | ||
Sorry, the diff of this file is not supported yet
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
939311
699
103