nearley - npm Package Compare versions

Comparing version 2.10.3 to 2.10.4

docs/accessing-parse-table.md

		# Accessing the internal parse table

		The `Parser` constructor takes an optional last parameter, `options`,
		which is an object with the following possible keys:
		If you are familiar with the Earley parsing algorithm, you can access the
		internal parse table using `Parser.table` (this, for example, is how
		`nearley-test` works). One caveat, however: you must pass the `keepHistory`
		option to nearley to prevent it from garbage-collecting inaccessible columns of
		the table.

		- `keepHistory` (boolean, default `false`) - whether to preserve and expose the internal state
		- `lexer` (object) - custom lexer, overrides `@lexer` in the grammar

		If you are familiar with the Earley parsing algorithm and are planning to do something exciting with the parse table, set `keepHistory`:

		```js
		@@ -15,8 +13,10 @@ const nearley = require("nearley");

		const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { keepHistory: true });
		const parser = new nearley.Parser(
		nearley.Grammar.fromCompiled(grammar),
		{ keepHistory: true }
		);

		// ...

		// After feeding data:
		parser.feed(...);
		console.log(parser.table);
		```

docs/custom-tokens-and-lexers.md

		# Custom tokens and lexers

		## Adding custom token matchers
		## Custom token matchers

		Sometimes you might want a more flexible way of matching tokens, whether you're using `@lexer` or not.
		Aside from the lexer infrastructure, nearley provides a lightweight way to
		parse arbitrary streams.

		Custom matchers can be defined in two ways: literal tokens and testable tokens. A
		literal token matches exactly, while a testable token runs a function to test
		whether it is a match or not.
		Custom matchers can be defined in two ways: literal tokens and testable
		tokens. A literal token matches a JS value exactly (with `===`), while a
		testable token runs a predicate that tests whether or not the value matches.

		Note that in this case, you would feed a `Parser` instance an array of
		objects rather than a string! Here is a simple example:

		```coffeescript
		@@ -17,24 +21,22 @@ @{%

		# Matches ["print", 12] if the input is an array with those elements.
		main -> %tokenPrint %tokenNumber
		main -> %tokenPrint %tokenNumber ";;"

		# parser.feed(["print", 12, ";;"]);
		```

		## Writing a custom lexer
		## Custom lexers

		If you don't want to use [Moo](https://github.com/tjvr/moo), our recommended lexer/tokenizer, you can define your own. Either pass it using `@lexer myLexer` in the grammar, or in options to `Parser`:
		nearley recommends using a [moo](https://github.com/tjvr/moo)-based lexer.
		However, you can use any lexer that conforms to the following interface:

		```js
		const nearley = require("nearley");
		const grammar = require("./grammar");
		const myLexer = require("./lexer");

		const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { lexer: myLexer });
		```

		You lexer must have the following interface:

		- `next() -> Token` return e.g. `{type, value, line, col, …}`. Only the `value` attribute is required.
		- `save() -> Info` -> return an object describing the current line/col etc. This allows us to preserve this information between `feed()` calls, and also to support `Parser#rewind()`. The exact structure is lexer-specific; nearley doesn't care what's in it.
		- `reset(chunk, Info)`: set the internal buffer to `chunk`, and restore line/col/state info taken from `save()`.
		- `formatError(token)` -> return a string with an error message describing the line/col of the offending token. You might like to include a preview of the line in question.
		- `has(tokenType)` -> return true if the lexer can emit tokens with that name. Used to resolve `%`-specifiers in compiled nearley grammars.
		- `next()` returns a token object, which could have fields for line number,
		etc. Importantly, a token object must have a `value` attribute.
		- `save()` returns an info object that describes the current state of the
		lexer. nearley places no restrictions on this object.
		- `reset(chunk, info)` sets the internal buffer of the lexer to `chunk`, and
		restores its state to a state returned by `save()`.
		- `formatError(token)` returns a string with an error message describing a
		parse error at that token (for example, the string might contain the line and
		column where the error was found).
		- `has(name)` returns true if the lexer can emit tokens with that name. This is
		used to resolve `%`-specifiers in compiled nearley grammars.

docs/glossary.md

		@@ -19,4 +19,4 @@ Glossary
		production rule: a set of strings specified as a sequence of symbols,
		such that the rule matches a string if it is a concatenation of strings matched
		by the respective symbols
		such that the rule matches a string if it is a concatenation of strings
		matched by the respective symbols

		@@ -80,7 +80,9 @@ symbol: a generic term for a member of a production rule, either a

		epsilon: the empty production rule, matching only the empty string
		epsilon: the empty production rule, matching only the empty string

		nullable rule: a production rule that matches the empty string
		nullable rule: a production rule that matches the empty string, even
		though it is not necessarily equal to the epsilon rule (for example, the
		concatenation of epsilon with epsilon)

		nearley: a parser that parses context-free languages, along with
		several additional utilities for building languages

docs/using-in-frontend.md

		@@ -1,9 +0,15 @@
		# Using nearley in browsers
		# Using the nearley compiler in browsers

		Use a tool like [Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to include the `nearley` NPM package in your browser code.
		Both the nearley parser and compiled grammars work in browsers; simply include
		`nearley.js` and your compiled `grammar.js` file in `<script>` tags and use
		nearley as usual. However, the nearley compiler is not designed for the
		browser -- you should precompile your grammars and only serve the generated JS
		files to browsers.

		The runtime part works fine in browsers, but there's no concise way to compile a grammar and pass it to the `Parser` constructor. If you have a single static grammar, just precompile it with `nearleyc` and include the compiled JS file in your frontend code.
		If you absolutely have to compile a grammar in a browser (for example, to
		implement a nearley IDE) then you can use a tool like
		[Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to
		include the `nearley` NPM package in your browser code. Then, you can utilize
		the `nearleyc` internals to compile grammars dynamically.

		If you absolutely have to compile a grammar in a browser, e.g. the user enters it into a textarea, then here's an example for you:

		```js
		@@ -20,16 +26,18 @@ const nearley = require("nearley");
		function compileGrammar(sourceCode) {
		// Oh boy, here we go. We're gonna do what `nearleyc` does.

		// Parse the custom grammar into AST as a nearley grammar.
		const grammarParser = new nearley.Parser(nearleyGrammar.ParserRules, nearleyGrammar.ParserStart);
		// Parse the grammar source into an AST
		const grammarParser = new nearley.Parser(
		nearleyGrammar.ParserRules,
		nearleyGrammar.ParserStart
		);
		grammarParser.feed(sourceCode);
		const grammarAst = grammarParser.results[0];
		const grammarAst = grammarParser.results[0]; // TODO check for errors

		// Compile the custom grammar into JS.
		const grammarInfoObject = compile(grammarAst, {}); // Returns an object with rules, etc.
		const grammarJs = generate(grammarInfoObject, "grammar"); // Stringifies that object into JS.
		// Compile the AST into a set of rules
		const grammarInfoObject = compile(grammarAst, {});
		// Generate JavaScript code from the rules
		const grammarJs = generate(grammarInfoObject, "grammar");

		// `nearleyc` would save JS to a file and you'd require it, but in a browser we can only eval.
		const module = { exports: {} }; // Pretend this is a CommonJS environment to catch exports from the grammar.
		eval(grammarJs); // Evaluated code sees everything in the lexical scope, it can see `module`.
		// Pretend this is a CommonJS environment to catch exports from the grammar.
		const module = { exports: {} };
		eval(grammarJs);

		@@ -36,0 +44,0 @@ return module.exports;

package.json

		{
		"name": "nearley",
		"version": "2.10.3",
		"version": "2.10.4",
		"description": "Simple, fast, powerful parser toolkit for JavaScript.",
		@@ -5,0 +5,0 @@ "main": "lib/nearley.js",

205

README.md

		@@ -1,4 +0,2 @@
		![](www/logo/nearley-purple.png)

		# [nearley](http://nearley.js.org)
		# [nearley](http://nearley.js.org) ↗️
		[![JS.ORG](https://img.shields.io/badge/js.org-nearley-ffb400.svg?style=flat-square)](http://js.org)
		@@ -32,3 +30,3 @@ [![npm version](https://badge.fury.io/js/nearley.svg)](https://badge.fury.io/js/nearley)
		- [compilers for real programming languages](https://github.com/sizigi/lp5562);
		- and nearley itself! The nearley compiler is written in itself.
		- and nearley itself! The nearley compiler is bootstrapped.

		@@ -48,7 +46,6 @@ nearley is an npm [staff
		- [Getting started: nearley in 3 steps](#getting-started-nearley-in-3-steps)
		- [Writing a parser](#writing-a-parser)
		- [Terminals, nonterminals, rules](#terminals-nonterminals-rules)
		- [Writing a parser: the nearley grammar language](#writing-a-parser-the-nearley-grammar-language)
		- [Vocabulary](#vocabulary)
		- [Postprocessors](#postprocessors)
		- [Target languages](#target-languages)
		- [Catching errors](#catching-errors)
		- [More syntax: tips and tricks](#more-syntax-tips-and-tricks)
		@@ -62,3 +59,6 @@ - [Comments](#comments)
		- [Importing other grammars](#importing-other-grammars)
		- [Tokenizers](#tokenizers)
		- [Using a parser: the nearley API](#using-a-parser-the-nearley-api)
		- [A note on ambiguity](#a-note-on-ambiguity)
		- [Catching errors](#catching-errors)
		- [Tokenizers](#tokenizers)
		- [Tools](#tools)
		@@ -74,3 +74,3 @@ - [nearley-test: Exploring a parser interactively](#nearley-test-exploring-a-parser-interactively)
		- [Recipes](#recipes)
		- [Details](#details)
		- [Blog posts](#blog-posts)

		@@ -157,7 +157,10 @@ <!-- END doctoc generated TOC please keep comment here to allow auto update -->

		## Writing a parser
		## Writing a parser: the nearley grammar language

		Let's explore the building blocks of a nearley parser.
		This section describes the nearley grammar language, in which you can describe
		grammars for nearley to parse. Grammars are conventionally kept in `.ne` files.
		You can then use `nearleyc` to compile your `.ne` grammars to JavaScript
		modules.

		### Terminals, nonterminals, rules
		### Vocabulary

		@@ -170,14 +173,19 @@ - A terminal is a string or a token. For example, the keyword `"if"` is a
		- A rule (or production rule) is a definition of a nonterminal. For example,
		`"if" condition "then" statement "endif"` is the rule according to which the
		if statement nonterminal is parsed.
		`ifStatement -> "if" condition "then" statement "endif"` is the rule
		according to which the if statement nonterminal is parsed.

		The first nonterminal of the grammar is the one the whole input must match.
		With the following grammar, nearley will try to parse text as `expression`.
		By default, nearley attempts to parse the first nonterminal defined in the
		grammar. In the following grammar, nearley will try to parse input text as an
		`expression`.

		```js
		expression -> number "+" number
		expression -> number "-" number
		expression -> number "*" number
		expression -> number "/" number
		number -> [0-9]:+
		```

		Use the pipe character `\|` to separate alternative rules for a nonterminal.
		You can use the pipe character `\|` to separate alternative rules for a
		nonterminal. In the example below, `expression` has four different rules.

		@@ -197,9 +205,5 @@ ```js
		```js
		a -> null
		\| a "cow"
		a -> null \| a "cow"
		```

		Keep in mind that nearley syntax is not sensitive to formatting. You're welcome
		to keep rules on the same line: `foo -> bar \| qux`.

		### Postprocessors
		@@ -291,35 +295,2 @@

		### Catching errors

		nearley is a streaming parser: you can keep feeding it more strings. This
		means that there are two error scenarios in nearley.

		Consider the simple parser below for the examples to follow.

		```js
		main -> "Cow goes moo." {% function(d) {return "yay!"; } %}
		```

		If there are no possible parsings given the current input, but in the future
		there might be results if you feed it more strings, then nearley will
		temporarily set the `results` array to the empty array, `[]`.

		```js
		parser.feed("Cow "); // parser.results is []
		parser.feed("goes "); // parser.results is []
		parser.feed("moo."); // parser.results is ["yay!"]
		```

		If there are no possible parsings, and there is no way to "recover" by feeding
		more data, then nearley will throw an error whose `offset` property is the
		index of the offending token.

		```js
		try {
		parser.feed("Cow goes% moo.");
		} catch(parseError) {
		console.log("Error at character " + parseError.offset); // "Error at character 9"
		}
		```

		### More syntax: tips and tricks
		@@ -435,9 +406,101 @@
		See the [`builtin/`](builtin) directory for more details. Contributions are
		welcome here!
		welcome!

		Including a file imports all of the nonterminals defined in it, as well as
		any JS, macros, and config options defined there.
		any JS, macros, and configuration options defined there.

		## Tokenizers
		## Using a parser: the nearley API

		Once you have compiled a `grammar.ne` file to a `grammar.js` module, you can
		then use nearley to instantiate a `Parser` object.

		First, import nearley and your grammar.

		```js
		const nearley = require("nearley");
		const grammar = require("./grammar.js");
		```

		Note that if you are parsing in the browser, you can simply include
		`nearley.js` and `grammar.js` in `<script>` tags.

		Next, use the grammar to create a new `nearley.Parser` object.

		```js
		// Create a Parser object from our grammar.
		const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
		```

		Once you have a `Parser`, you can `.feed` it a string to parse. Since nearley
		is a streaming parser, you can feed strings more than once. For example, a
		REPL might feed the parser lines of code as the user enters them:

		```js
		// Parse something!
		parser.feed("if (true) {");
		parser.feed("x = 1");
		parser.feed("}");
		// or, parser.feed("if (true) {x=1}");
		```

		Finally, you can query the `.results` property of the parser.

		```js
		// parser.results is an array of possible parsings.
		console.log(parser.results);
		// [{'type': 'if', 'condition': ..., 'body': ...}]
		```

		### A note on ambiguity

		Why is `parser.results` an array? Sometimes, a grammar can parse a particular
		string in multiple different ways. For example, the following grammar parses
		the string `"xyz"` in two different ways.

		```js
		x -> "xy" "z"
		\| "x" "yz"
		```

		Such grammars are ambiguous. nearley provides you with all the parsings. In
		most cases, however, your grammars should not be ambiguous (parsing ambiguous
		grammars is inefficient!). Thus, the most common usage is to simply query
		`parser.results[0]`.

		### Catching errors

		nearley is a streaming parser: you can keep feeding it more strings. This
		means that there are two error scenarios in nearley.

		Consider the simple parser below for the examples to follow.

		```js
		main -> "Cow goes moo." {% function(d) {return "yay!"; } %}
		```

		If there are no possible parsings given the current input, but in the future
		there might be results if you feed it more strings, then nearley will
		temporarily set the `results` array to the empty array, `[]`.

		```js
		parser.feed("Cow "); // parser.results is []
		parser.feed("goes "); // parser.results is []
		parser.feed("moo."); // parser.results is ["yay!"]
		```

		If there are no possible parsings, and there is no way to "recover" by feeding
		more data, then nearley will throw an error whose `offset` property is the
		index of the offending token.

		```js
		try {
		parser.feed("Cow goes% moo.");
		} catch(parseError) {
		console.log("Error at character " + parseError.offset); // "Error at character 9"
		}
		```


		### Tokenizers

		By default, nearley splits the input into a stream of characters. This is
		@@ -580,2 +643,6 @@ called scannerless parsing.

		Node users can programmatically access the unparser using
		[nearley-there](https://github.com/stolksdorf/nearley-there) by Scott
		Tolksdorf.

		Browser users can use
		@@ -605,16 +672,9 @@ [nearley-playground](https://omrelli.ug/nearley-playground/) by Guillermo

		Tests live in `test/` and can be called with `npm test`. Please run the
		benchmarks before and after your changes: parsing is tricky, and small changes
		can kill efficiency. We learned this the hard way!
		Please read [this document](.github/CONTRIBUTING.md) before working on
		nearley. If you are interested in contributing but unsure where to start, take
		a look at the issues labeled "up for grabs" on the issue tracker, or message a
		maintainer.

		If you're looking for something to do, here's a short list of things that would
		make me happy:
		nearley is MIT licensed.

		- Optimize. There are still plenty of optimizations that an enterprising
		JS-savant could implement.
		- Help build the builtins library by PRing in your favorite primitives.
		- Solutions to issues labeled "up for grabs" on the issue tracker.

		Nearley is MIT licensed.

		A big thanks to Nathan Dinsmore for teaching me how to Earley, Aria Stewart for
		@@ -639,7 +699,5 @@ helping structure nearley into a mature module, and Robin Windels for

		- [Transforming parse trees](docs/generating-cst-ast.md)
		- [Writing an indentation-aware (Python-like) lexer](https://gist.github.com/nathan/d8d1adea38a1ef3a6d6a06552da641aa)
		- [Making a REPL for your language](docs/making-a-repl.md)

		### Details
		### Blog posts

		@@ -653,2 +711,1 @@ - Read my [blog post](http://hardmath123.github.io/earley.html) to learn more
		written by @gajus.

docs/generating-cst-ast.md

docs/making-a-repl.md

index.html

Sorry, the diff of this file is not supported yet

		@@ -19,4 +19,4 @@ Glossary
		production rule: a set of strings specified as a sequence of symbols,
		such that the rule matches a string if it is a concatenation of strings matched
		by the respective symbols
		such that the rule matches a string if it is a concatenation of strings
		matched by the respective symbols

		@@ -80,7 +80,9 @@ symbol: a generic term for a member of a production rule, either a

		epsilon: the empty production rule, matching only the empty string
		epsilon: the empty production rule, matching only the empty string

		nullable rule: a production rule that matches the empty string
		nullable rule: a production rule that matches the empty string, even
		though it is not necessarily equal to the epsilon rule (for example, the
		concatenation of epsilon with epsilon)

		nearley: a parser that parses context-free languages, along with
		several additional utilities for building languages

nearley - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics