What is nearley?
The nearley npm package is a fast, feature-rich, and modern parser toolkit for JavaScript. It is based on Earley's algorithm and can be used to create parsers for complex, context-free grammars. nearley is designed to be simple to use and extend, making it a good choice for building compilers, interpreters, and other language-related tools.
What are nearley's main functionalities?
Grammar Definition
This feature allows you to define a grammar for your language. The grammar is written in a simple, JSON-like format and compiled into a parser.
{"module.exports = grammar({main: $ => ['hello', $.world],world: $ => 'world'});"}
Parsing Input
Once you have defined a grammar, you can create a parser and feed it input to parse. The parser will output a parse tree or a list of possible parse trees if the input is ambiguous.
{"const nearley = require('nearley');\nconst grammar = require('./your-grammar.js');\nconst parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));\nparser.feed('hello world');\nconst results = parser.results;\nconsole.log(results);"}
Error Reporting
nearley provides error reporting features that help you understand where and why a parse failed, which is useful for debugging grammars and providing feedback to users.
{"const nearley = require('nearley');\nconst grammar = require('./your-grammar.js');\nconst parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));\ntry {\n parser.feed('hello wor');\n} catch (error) {\n console.error(error.message);\n}"}
Other packages similar to nearley
pegjs
PEG.js is a simple parser generator for JavaScript that produces fast parsers with excellent error reporting. It uses Parsing Expression Grammars (PEG) as the input. Compared to nearley, PEG.js grammars are arguably easier to read and write but are less powerful in terms of expressing certain types of grammars.
chevrotain
Chevrotain is a high-performance, self-optimizing parser building toolkit for JavaScript. Unlike nearley, which uses Earley's algorithm, Chevrotain is based on parsing techniques that do not require a separate parser generation step. It provides a rich feature set and is particularly well-suited for building complex parsers.
antlr4
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator that supports multiple languages, including JavaScript. ANTLR is more complex than nearley but offers a very rich set of features for building sophisticated language processors. It uses LL(*) parsing which is different from nearley's Earley-based approach.
jison
Jison is an npm package that generates bottom-up parsers in JavaScript. Inspired by Bison, it is capable of handling LR and LALR grammars. Jison can be considered more traditional compared to nearley's modern approach, and it might be more familiar to those with experience in classic parser generators.
oooo
`888
ooo. .oo. .ooooo. .oooo. oooo d8b 888 .ooooo. oooo ooo
`888P"Y88b d88' `88b `P )88b `888""8P 888 d88' `88b `88. .8'
888 888 888ooo888 .oP"888 888 888 888ooo888 `88..8'
888 888 888 .o d8( 888 888 888 888 .o `888'
o888o o888o `Y8bod8P' `Y888""8o d888b o888o `Y8bod8P' .8'
.o..P'
`Y8P'
nearley.js
Simple parsing for JavaScript.
What?
nearley.js uses the Earley parsing algorithm to parse complex data structures easily.
Why?
nearley.js lets you define grammars in a simple format. Unlike Jison's tokenizer-and-parser approach, I use a single set of definitions. Unlike PEG.js, this parser handles left recursion gracefully and warns you if your grammar is ambiguous (ambiguous grammars are slower and take up more memory). Finally, nearley.js generates tiny files, which won't affect performance even if they are unminified.
How?
To compile a parser, use the nearleyc
command:
npm install -g nearley
nearleyc parser.ne
A parser consists of several nonterminals, which are just various constructions. A nonterminal is made up of a series of either nonterminals or strings (enclose strings in "double quotes", and use backslash escaping like in JSON). The following grammar matches a number, a plus sign, and another number:
expression -> number "+" number
The first nonterminal you define is the one that the parser tries to parse.
A nonterminal can have multiple meanings, separated by pipes (|
):
expression -> number "+" number | number "-" number
Finally, each meaning (called a production rule) can have a postprocessing function, that can format the data in a way that you would like:
expression -> number "+" number {%
function (data) {
return data[0] + data[2]; // the sum of the two numbers
}
%}
data
is an array whose elements match the nonterminals in order.
To use the generated parser, use:
var parse = require("parser.js");
console.log(parse("1+1")); // 2
console.log(parse("cow")); // throws error: "nearley parse error"
The epsilon rule is the empty rule that matches nothing. The constant null
is the epsilon rule, so:
a -> null
| a "cow"
will match 0 or more cow
s in a row.
The following constants are also defined:
Constant | Meaning | Regex Equivalent |
---|
_char | Any character | /./ |
_az | Any lowercase letter | /[a-z]/ |
_AZ | Any uppercase letter | /[A-Z]/ |
_09 | Any digit | [0-9] |
_s | A whitespace character | /\s/ |
Errors
A parse error will throw the string "nearley parse error".
You may get a warning saying your grammar is ambiguous. This means that there are multiple ways to parse the given input with the given grammar.
nearley.js does not support detailed error messages yet.
Past changes
- 0.0.1: Initial release
- 0.0.2: Null rule
- 0.0.3: Predefined charsets