What is chevrotain?
Chevrotain is a fast and feature-rich parser building toolkit for JavaScript. It can be used to build parsers for DSLs, programming languages, data formats, and more. It provides a set of APIs for defining grammar rules and constructing a parser based on those rules.
What are chevrotain's main functionalities?
Defining Token Types
This code sample demonstrates how to define token types using Chevrotain. Tokens are the basic building blocks of the syntax for a language or format. In this example, we define tokens for integers and the plus and minus symbols.
const { createToken, Lexer } = require('chevrotain');
const Integer = createToken({ name: 'Integer', pattern: /\d+/ });
const Plus = createToken({ name: 'Plus', pattern: /\+/ });
const Minus = createToken({ name: 'Minus', pattern: /-/ });
const allTokens = [Plus, Minus, Integer];
const MyLexer = new Lexer(allTokens);
Building a Parser
This code sample shows how to build a parser using Chevrotain. The parser is defined as a class that extends `CstParser` and uses rules to define the grammar of the language. In this example, we define a simple grammar for addition expressions.
const { CstParser } = require('chevrotain');
class MyParser extends CstParser {
constructor() {
super(allTokens);
this.RULE('expression', () => {
this.SUBRULE(this.additionExpression);
});
this.RULE('additionExpression', () => {
this.CONSUME(Integer);
this.MANY(() => {
this.OR([
{ ALT: () => { this.CONSUME(Plus); this.CONSUME2(Integer); } },
{ ALT: () => { this.CONSUME(Minus); this.CONSUME2(Integer); } }
]);
});
});
this.performSelfAnalysis();
}
}
const parser = new MyParser();
Parsing Text
This code sample illustrates how to parse text using a lexer and parser defined with Chevrotain. The text is tokenized by the lexer, and then the tokens are fed into the parser to produce a Concrete Syntax Tree (CST), which can be used for further processing such as interpretation or transformation into an Abstract Syntax Tree (AST).
const { tokenMatcher } = require('chevrotain');
const text = '1 + 2 - 3';
const lexingResult = MyLexer.tokenize(text);
if (lexingResult.errors.length === 0) {
parser.input = lexingResult.tokens;
const cst = parser.expression();
if (parser.errors.length === 0) {
// cst can now be used to create an AST or for interpretation.
} else {
// parser errors are present
}
} else {
// lexing errors are present
}
Other packages similar to chevrotain
pegjs
PEG.js is a simple parser generator for JavaScript that produces fast parsers with excellent error reporting. It uses Parsing Expression Grammars (PEG) as the input. Compared to Chevrotain, PEG.js has a different approach to defining grammars (PEG vs. Chevrotain's API) and does not require manual token definition.
antlr4
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator that can be used to read, process, execute, or translate structured text or binary files. It's widely used to build languages, tools, and frameworks. ANTLR4 has a Java-based toolchain with targets for multiple languages including JavaScript. It is more complex than Chevrotain and has a steeper learning curve, but it is also more feature-rich.
jison
Jison is an API for creating parsers in JavaScript that works similarly to yacc. It takes a context-free grammar as input and outputs a JavaScript file capable of parsing the language described by that grammar. Jison handles both lexical and syntactical analysis, which means it combines the features of both lexer and parser generators. It is less modular than Chevrotain but can be easier to use for those familiar with yacc or bison.
nearley
Nearley is a simple, fast, and powerful parsing toolkit for JavaScript. It is based on Earley's algorithm, which is suitable for parsing complex and ambiguous grammars. Nearley is designed to be more user-friendly and flexible than traditional parser generators. It allows for dynamic grammar and can handle any kind of parsing task. It is comparable to Chevrotain in terms of ease of use but uses a different underlying algorithm for parsing.
Chevrotain
Chevrotain is a high performance fault Tolerant Javascript parsing DSL for building recursive decent parsers.
Chevrotain is NOT a parser generator. it solves the same kind of problems as a parser generator, just without
the code generation phase.
Features
-
Lexer engine based on RexExps.
- Supports Token location tracking.
- Supports Token skipping (whitespace/comments/...)
- Allows prioritising shorter matches (Keywords vs Identifiers).
- No code generation The Lexer does not require any code generation phase.
-
Parsing DSL for creating the parsing rules.
- No code generation - the DSL is just javascript not a new external language, what is written is what will be run, this speeds up development,
makes debugging trivial and provides great flexibility for inserting custom actions into the grammar.
- Strong Error Recovery capabilities based on Antlr3's algorithms.
- Automatic lookahead calculation for LL(1) grammars.
- In addition custom lookahead logic can be provided explicitly.
- Backtracking support.
-
High performance see: performance comparison
-
Grammar Introspection, the grammar's structure is known and exposed this can be used to implement features such as automatically generated syntax diagrams or Syntactic error recovery.
-
Well tested with ~100% code coverage
Installation
- npm:
npm install chevrotain
- Bower
bower install chevrotain
- or download directly from github releases:
- the 'chevrotain-binaries-...' files contain the compiled javascript code.
Usage example JSON Parser:
- The following example uses several features of ES6 (fat arrow/classes).
These are not mandatory for using Chevrotain, they just make the example clearer.
The example is also provided in ES5 syntax
step 1: define your Tokens:
var Token = require("chevrotain").Token
class Keyword extends Token { static PATTERN = NA }
class True extends Keyword { static PATTERN = /true/ }
class False extends Keyword { static PATTERN = /false/ }
class Null extends Keyword { static PATTERN = /null/ }
class LCurly extends Token { static PATTERN = /{/ }
class RCurly extends Token { static PATTERN = /}/ }
class LSquare extends Token { static PATTERN = /\[/ }
class RSquare extends Token { static PATTERN = /]/ }
class Comma extends Token { static PATTERN = /,/ }
class Colon extends Token { static PATTERN = /:/ }
class StringLiteral extends Token { static PATTERN = /"(:?[^\\"]+|\\(:?[bfnrtv"\\/]|u[0-9a-fA-F]{4}))*"/}
class NumberLiteral extends Token { static PATTERN = /-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?/ }
class WhiteSpace extends Token {
static PATTERN = /\s+/
static GROUP = SKIPPED
}
step 2: create a lexer from the Token definitions:
var Lexer = require("chevrotain").Lexer
var JsonLexer = new chevrotain.Lexer([WhiteSpace, NumberLiteral, StringLiteral,
RCurly, LCurly, LSquare, RSquare, Comma, Colon, True, False, Null])
step 3: define the parsing rules:
var Parser = require("chevrotain").Parser
class JsonParser extends Parser {
constructor(input) {
Parser.performSelfAnalysis(this)
}
public object = this.RULE("object", () => {
this.CONSUME(LCurly)
this.OPTION(() => {
this.SUBRULE(this.objectItem)
this.MANY(() => {
this.CONSUME(Comma)
this.SUBRULE2(this.objectItem)
})
})
this.CONSUME(RCurly)
})
public objectItem = this.RULE("objectItem", () => {
this.CONSUME(StringLiteral)
this.CONSUME(Colon)
this.SUBRULE(this.value)
})
public array = this.RULE("array", () => {
this.CONSUME(LSquare)
this.OPTION(() => {
this.SUBRULE(this.value)
this.MANY(() => {
this.CONSUME(Comma)
this.SUBRULE2(this.value)
})
})
this.CONSUME(RSquare)
})
public value = this.RULE("value", () => {
this.OR([
{ALT: () => {this.CONSUME(StringLiteral)}},
{ALT: () => {this.CONSUME(NumberLiteral)}},
{ALT: () => {this.SUBRULE(this.object)}},
{ALT: () => {this.SUBRULE(this.array)}},
{ALT: () => {this.CONSUME(True)}},
{ALT: () => {this.CONSUME(False)}},
{ALT: () => {this.CONSUME(Null)}}
], "a value")
})
}
step 4: add custom actions to the grammar defined in step 3
- this shows the modification for just two grammar rules.
public object = this.RULE("object", () => {
var items = []
this.CONSUME(LCurly)
this.OPTION(() => {
items.push(this.SUBRULE(this.objectItem))
this.MANY(() => {
this.CONSUME(Comma)
items.push(this.SUBRULE2(this.objectItem))
})
})
this.CONSUME(RCurly)
var obj = {}
items.forEach((item) => {
obj[item.itemName] = item.itemValue
})
return obj
})
public objectItem = this.RULE("objectItem", () => {
var nameToken = this.CONSUME(StringLiteral)
this.CONSUME(Colon)
var value = this.SUBRULE(this.value)
var itemNameString = nameToken.image
var itemName = itemNameString.substr(1, itemNameString.length - 2)
return {itemName:itemName, itemValue:value}
})
...
step 5: wrap it all together
function lexAndParse(text) {
var lexResult = JsonLexer.tokenize(text)
var parser = new JsonParser(lexResult.tokens)
return parser.object()
}
Getting Started
The best way to start is by looking at some runable (and debugable) examples:
Documentation
No html docs (yet...), use either :
- Annotated source code:
- The aggregated Typescript definitions chevrotain.d.ts
- Also packaged in both the github and npm releases.
Dependencies
Only a single dependency to lodash.
Compatibility
The Generated artifact(chevrotain.js) should run on any modern Javascript ES5.1 runtime.
- The CI build runs the tests under Node.js.
- Additionally local testing is done on latest versions of Chrome/Firefox/IE.
- The dependency to lodash is imported via UMD,
in order to make chevrotain.js portable to multiple environments.