Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

nearley

Package Overview
Dependencies
Maintainers
2
Versions
92
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

nearley - npm Package Compare versions

Comparing version 2.10.3 to 2.10.4

20

docs/accessing-parse-table.md
# Accessing the internal parse table
The `Parser` constructor takes an optional last parameter, `options`,
which is an object with the following possible keys:
If you are familiar with the Earley parsing algorithm, you can access the
internal parse table using `Parser.table` (this, for example, is how
`nearley-test` works). One caveat, however: you must pass the `keepHistory`
option to nearley to prevent it from garbage-collecting inaccessible columns of
the table.
- `keepHistory` (boolean, default `false`) - whether to preserve and expose the internal state
- `lexer` (object) - custom lexer, overrides `@lexer` in the grammar
If you are familiar with the Earley parsing algorithm and are planning to do something exciting with the parse table, set `keepHistory`:
```js

@@ -15,8 +13,10 @@ const nearley = require("nearley");

const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { keepHistory: true });
const parser = new nearley.Parser(
nearley.Grammar.fromCompiled(grammar),
{ keepHistory: true }
);
// ...
// After feeding data:
parser.feed(...);
console.log(parser.table);
```
# Custom tokens and lexers
## Adding custom token matchers
## Custom token matchers
Sometimes you might want a more flexible way of matching tokens, whether you're using `@lexer` or not.
Aside from the lexer infrastructure, nearley provides a lightweight way to
parse arbitrary streams.
Custom matchers can be defined in two ways: literal tokens and testable tokens. A
literal token matches exactly, while a testable token runs a function to test
whether it is a match or not.
Custom matchers can be defined in two ways: *literal* tokens and *testable*
tokens. A literal token matches a JS value exactly (with `===`), while a
testable token runs a predicate that tests whether or not the value matches.
Note that in this case, you would feed a `Parser` instance an *array* of
objects rather than a string! Here is a simple example:
```coffeescript

@@ -17,24 +21,22 @@ @{%

# Matches ["print", 12] if the input is an array with those elements.
main -> %tokenPrint %tokenNumber
main -> %tokenPrint %tokenNumber ";;"
# parser.feed(["print", 12, ";;"]);
```
## Writing a custom lexer
## Custom lexers
If you don't want to use [Moo](https://github.com/tjvr/moo), our recommended lexer/tokenizer, you can define your own. Either pass it using `@lexer myLexer` in the grammar, or in options to `Parser`:
nearley recommends using a [moo](https://github.com/tjvr/moo)-based lexer.
However, you can use any lexer that conforms to the following interface:
```js
const nearley = require("nearley");
const grammar = require("./grammar");
const myLexer = require("./lexer");
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar), { lexer: myLexer });
```
You lexer must have the following interface:
- `next() -> Token` return e.g. `{type, value, line, col, …}`. Only the `value` attribute is required.
- `save() -> Info` -> return an object describing the current line/col etc. This allows us to preserve this information between `feed()` calls, and also to support `Parser#rewind()`. The exact structure is lexer-specific; nearley doesn't care what's in it.
- `reset(chunk, Info)`: set the internal buffer to `chunk`, and restore line/col/state info taken from `save()`.
- `formatError(token)` -> return a string with an error message describing the line/col of the offending token. You might like to include a preview of the line in question.
- `has(tokenType)` -> return true if the lexer can emit tokens with that name. Used to resolve `%`-specifiers in compiled nearley grammars.
- `next()` returns a token object, which could have fields for line number,
etc. Importantly, a token object *must* have a `value` attribute.
- `save()` returns an info object that describes the current state of the
lexer. nearley places no restrictions on this object.
- `reset(chunk, info)` sets the internal buffer of the lexer to `chunk`, and
restores its state to a state returned by `save()`.
- `formatError(token)` returns a string with an error message describing a
parse error at that token (for example, the string might contain the line and
column where the error was found).
- `has(name)` returns true if the lexer can emit tokens with that name. This is
used to resolve `%`-specifiers in compiled nearley grammars.

@@ -19,4 +19,4 @@ Glossary

**production rule**: a set of *strings* specified as a sequence of *symbols*,
such that the rule matches a string if it is a concatenation of strings matched
by the respective symbols
such that the rule matches a string if it is a *concatenation* of strings
matched by the respective symbols

@@ -80,7 +80,9 @@ **symbol**: a generic term for a member of a *production rule*, either a

**epsilon**: the empty production rule, matching only the empty string
**epsilon**: the empty *production rule*, matching only the empty string
**nullable rule**: a production rule that matches the empty string
**nullable rule**: a *production rule* that matches the empty string, even
though it is not necessarily equal to the *epsilon* rule (for example, the
*concatenation* of epsilon with epsilon)
**nearley**: a *parser* that parses *context-free languages*, along with
several additional utilities for building languages

@@ -1,9 +0,15 @@

# Using nearley in browsers
# Using the nearley compiler in browsers
Use a tool like [Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to include the `nearley` NPM package in your browser code.
Both the nearley parser and compiled grammars work in browsers; simply include
`nearley.js` and your compiled `grammar.js` file in `<script>` tags and use
nearley as usual. However, the nearley *compiler* is not designed for the
browser -- you should precompile your grammars and only serve the generated JS
files to browsers.
The runtime part works fine in browsers, but there's no concise way to compile a grammar and pass it to the `Parser` constructor. If you have a single static grammar, just precompile it with `nearleyc` and include the compiled JS file in your frontend code.
If you absolutely have to compile a grammar in a browser (for example, to
implement a nearley IDE) then you can use a tool like
[Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to
include the `nearley` NPM package in your browser code. Then, you can utilize
the `nearleyc` internals to compile grammars dynamically.
If you absolutely have to compile a grammar in a browser, e.g. the user enters it into a textarea, then here's an example for you:
```js

@@ -20,16 +26,18 @@ const nearley = require("nearley");

function compileGrammar(sourceCode) {
// Oh boy, here we go. We're gonna do what `nearleyc` does.
// Parse the custom grammar into AST as a nearley grammar.
const grammarParser = new nearley.Parser(nearleyGrammar.ParserRules, nearleyGrammar.ParserStart);
// Parse the grammar source into an AST
const grammarParser = new nearley.Parser(
nearleyGrammar.ParserRules,
nearleyGrammar.ParserStart
);
grammarParser.feed(sourceCode);
const grammarAst = grammarParser.results[0];
const grammarAst = grammarParser.results[0]; // TODO check for errors
// Compile the custom grammar into JS.
const grammarInfoObject = compile(grammarAst, {}); // Returns an object with rules, etc.
const grammarJs = generate(grammarInfoObject, "grammar"); // Stringifies that object into JS.
// Compile the AST into a set of rules
const grammarInfoObject = compile(grammarAst, {});
// Generate JavaScript code from the rules
const grammarJs = generate(grammarInfoObject, "grammar");
// `nearleyc` would save JS to a file and you'd require it, but in a browser we can only eval.
const module = { exports: {} }; // Pretend this is a CommonJS environment to catch exports from the grammar.
eval(grammarJs); // Evaluated code sees everything in the lexical scope, it can see `module`.
// Pretend this is a CommonJS environment to catch exports from the grammar.
const module = { exports: {} };
eval(grammarJs);

@@ -36,0 +44,0 @@ return module.exports;

{
"name": "nearley",
"version": "2.10.3",
"version": "2.10.4",
"description": "Simple, fast, powerful parser toolkit for JavaScript.",

@@ -5,0 +5,0 @@ "main": "lib/nearley.js",

@@ -1,4 +0,2 @@

![](www/logo/nearley-purple.png)
# [nearley](http://nearley.js.org)
# [nearley](http://nearley.js.org) ↗️
[![JS.ORG](https://img.shields.io/badge/js.org-nearley-ffb400.svg?style=flat-square)](http://js.org)

@@ -32,3 +30,3 @@ [![npm version](https://badge.fury.io/js/nearley.svg)](https://badge.fury.io/js/nearley)

- [compilers for real programming languages](https://github.com/sizigi/lp5562);
- and nearley itself! The nearley compiler is written in *itself*.
- and nearley itself! The nearley compiler is bootstrapped.

@@ -48,7 +46,6 @@ nearley is an npm [staff

- [Getting started: nearley in 3 steps](#getting-started-nearley-in-3-steps)
- [Writing a parser](#writing-a-parser)
- [Terminals, nonterminals, rules](#terminals-nonterminals-rules)
- [Writing a parser: the nearley grammar language](#writing-a-parser-the-nearley-grammar-language)
- [Vocabulary](#vocabulary)
- [Postprocessors](#postprocessors)
- [Target languages](#target-languages)
- [Catching errors](#catching-errors)
- [More syntax: tips and tricks](#more-syntax-tips-and-tricks)

@@ -62,3 +59,6 @@ - [Comments](#comments)

- [Importing other grammars](#importing-other-grammars)
- [Tokenizers](#tokenizers)
- [Using a parser: the nearley API](#using-a-parser-the-nearley-api)
- [A note on ambiguity](#a-note-on-ambiguity)
- [Catching errors](#catching-errors)
- [Tokenizers](#tokenizers)
- [Tools](#tools)

@@ -74,3 +74,3 @@ - [nearley-test: Exploring a parser interactively](#nearley-test-exploring-a-parser-interactively)

- [Recipes](#recipes)
- [Details](#details)
- [Blog posts](#blog-posts)

@@ -157,7 +157,10 @@ <!-- END doctoc generated TOC please keep comment here to allow auto update -->

## Writing a parser
## Writing a parser: the nearley grammar language
Let's explore the building blocks of a nearley parser.
This section describes the nearley grammar language, in which you can describe
grammars for nearley to parse. Grammars are conventionally kept in `.ne` files.
You can then use `nearleyc` to compile your `.ne` grammars to JavaScript
modules.
### Terminals, nonterminals, rules
### Vocabulary

@@ -170,14 +173,19 @@ - A *terminal* is a string or a token. For example, the keyword `"if"` is a

- A *rule* (or production rule) is a definition of a nonterminal. For example,
`"if" condition "then" statement "endif"` is the rule according to which the
if statement nonterminal is parsed.
`ifStatement -> "if" condition "then" statement "endif"` is the rule
according to which the if statement nonterminal is parsed.
The first nonterminal of the grammar is the one the whole input must match.
With the following grammar, nearley will try to parse text as `expression`.
By default, nearley attempts to parse the first nonterminal defined in the
grammar. In the following grammar, nearley will try to parse input text as an
`expression`.
```js
expression -> number "+" number
expression -> number "-" number
expression -> number "*" number
expression -> number "/" number
number -> [0-9]:+
```
Use the pipe character `|` to separate alternative rules for a nonterminal.
You can use the pipe character `|` to separate alternative rules for a
nonterminal. In the example below, `expression` has four different rules.

@@ -197,9 +205,5 @@ ```js

```js
a -> null
| a "cow"
a -> null | a "cow"
```
Keep in mind that nearley syntax is not sensitive to formatting. You're welcome
to keep rules on the same line: `foo -> bar | qux`.
### Postprocessors

@@ -291,35 +295,2 @@

### Catching errors
nearley is a *streaming* parser: you can keep feeding it more strings. This
means that there are two error scenarios in nearley.
Consider the simple parser below for the examples to follow.
```js
main -> "Cow goes moo." {% function(d) {return "yay!"; } %}
```
If there are no possible parsings given the current input, but in the *future*
there *might* be results if you feed it more strings, then nearley will
temporarily set the `results` array to the empty array, `[]`.
```js
parser.feed("Cow "); // parser.results is []
parser.feed("goes "); // parser.results is []
parser.feed("moo."); // parser.results is ["yay!"]
```
If there are no possible parsings, and there is no way to "recover" by feeding
more data, then nearley will throw an error whose `offset` property is the
index of the offending token.
```js
try {
parser.feed("Cow goes% moo.");
} catch(parseError) {
console.log("Error at character " + parseError.offset); // "Error at character 9"
}
```
### More syntax: tips and tricks

@@ -435,9 +406,101 @@

See the [`builtin/`](builtin) directory for more details. Contributions are
welcome here!
welcome!
Including a file imports *all* of the nonterminals defined in it, as well as
any JS, macros, and config options defined there.
any JS, macros, and configuration options defined there.
## Tokenizers
## Using a parser: the nearley API
Once you have compiled a `grammar.ne` file to a `grammar.js` module, you can
then use nearley to instantiate a `Parser` object.
First, import nearley and your grammar.
```js
const nearley = require("nearley");
const grammar = require("./grammar.js");
```
Note that if you are parsing in the browser, you can simply include
`nearley.js` and `grammar.js` in `<script>` tags.
Next, use the grammar to create a new `nearley.Parser` object.
```js
// Create a Parser object from our grammar.
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
```
Once you have a `Parser`, you can `.feed` it a string to parse. Since nearley
is a *streaming* parser, you can feed strings more than once. For example, a
REPL might feed the parser lines of code as the user enters them:
```js
// Parse something!
parser.feed("if (true) {");
parser.feed("x = 1");
parser.feed("}");
// or, parser.feed("if (true) {x=1}");
```
Finally, you can query the `.results` property of the parser.
```js
// parser.results is an array of possible parsings.
console.log(parser.results);
// [{'type': 'if', 'condition': ..., 'body': ...}]
```
### A note on ambiguity
Why is `parser.results` an array? Sometimes, a grammar can parse a particular
string in multiple different ways. For example, the following grammar parses
the string `"xyz"` in two different ways.
```js
x -> "xy" "z"
| "x" "yz"
```
Such grammars are *ambiguous*. nearley provides you with *all* the parsings. In
most cases, however, your grammars should not be ambiguous (parsing ambiguous
grammars is inefficient!). Thus, the most common usage is to simply query
`parser.results[0]`.
### Catching errors
nearley is a *streaming* parser: you can keep feeding it more strings. This
means that there are two error scenarios in nearley.
Consider the simple parser below for the examples to follow.
```js
main -> "Cow goes moo." {% function(d) {return "yay!"; } %}
```
If there are no possible parsings given the current input, but in the *future*
there *might* be results if you feed it more strings, then nearley will
temporarily set the `results` array to the empty array, `[]`.
```js
parser.feed("Cow "); // parser.results is []
parser.feed("goes "); // parser.results is []
parser.feed("moo."); // parser.results is ["yay!"]
```
If there are no possible parsings, and there is no way to "recover" by feeding
more data, then nearley will throw an error whose `offset` property is the
index of the offending token.
```js
try {
parser.feed("Cow goes% moo.");
} catch(parseError) {
console.log("Error at character " + parseError.offset); // "Error at character 9"
}
```
### Tokenizers
By default, nearley splits the input into a stream of characters. This is

@@ -580,2 +643,6 @@ called *scannerless* parsing.

Node users can programmatically access the unparser using
[nearley-there](https://github.com/stolksdorf/nearley-there) by Scott
Tolksdorf.
Browser users can use

@@ -605,16 +672,9 @@ [nearley-playground](https://omrelli.ug/nearley-playground/) by Guillermo

Tests live in `test/` and can be called with `npm test`. Please run the
benchmarks before and after your changes: parsing is tricky, and small changes
can kill efficiency. We learned this the hard way!
Please read [this document](.github/CONTRIBUTING.md) *before* working on
nearley. If you are interested in contributing but unsure where to start, take
a look at the issues labeled "up for grabs" on the issue tracker, or message a
maintainer.
If you're looking for something to do, here's a short list of things that would
make me happy:
nearley is MIT licensed.
- Optimize. There are still plenty of optimizations that an enterprising
JS-savant could implement.
- Help build the builtins library by PRing in your favorite primitives.
- Solutions to issues labeled "up for grabs" on the issue tracker.
Nearley is MIT licensed.
A big thanks to Nathan Dinsmore for teaching me how to Earley, Aria Stewart for

@@ -639,7 +699,5 @@ helping structure nearley into a mature module, and Robin Windels for

- [Transforming parse trees](docs/generating-cst-ast.md)
- [Writing an indentation-aware (Python-like) lexer](https://gist.github.com/nathan/d8d1adea38a1ef3a6d6a06552da641aa)
- [Making a REPL for your language](docs/making-a-repl.md)
### Details
### Blog posts

@@ -653,2 +711,1 @@ - Read my [blog post](http://hardmath123.github.io/earley.html) to learn more

written by @gajus.

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc