Masala Parser: Javascript Parser Combinators
Masala Parser is inspired by the paper titled:
Direct Style Monadic Parser Combinators For The Real World.
Masala Parser is a Javascript implementation of the Haskell Parsec.
It is plain Javascript that works in the browser, is tested with more than 450 unit tests, covering 100% of code lines.
Use cases
- It can create a full parser from scratch
- It can extract data from a big text and replace complex regexp
- It works in any browser
- There is a good typescript type declaration
- It can validate complete structure with variations
- It's a great starting point for parser education. It's way simpler than Lex & Yacc.
- It's designed to be written in other languages (Python, Java, Rust) with the same interface
Masala Parser keywords are simplicity, variations and maintainability. You won't
need theoretical bases on languages for extraction or validation use cases.
Masala Parser has relatively good performances, however Javascript is obviously not the fastest machine.
Usage
With Node Js or modern build
npm install -S @masala/parser
Or in the browser
Check the Change Log if you can from a previous version.
Reference
You will find an Masala Parser online reference, generated from typescript interface.
Quick Examples
Hello World
const helloParser = C.string('hello');
const white = C.char(' ');
const worldParser = C.char('world');
const combinator = helloParser.then(white.rep()).then(worldParser);
Floor notation
const {Streams, N, C}= require('@masala/parser');
const stream = Stream.ofString('|4.6|');
const floorCombinator = C.char('|').drop()
.then(N.number())
.then(C.char('|').drop())
.single()
.map(x =>Math.floor(x));
const parsing = floorCombinator.parse(stream);
assertEquals( 4, parsing.value, 'Floor parsing');
Explanations
According to Wikipedia "in functional programming, a parser combinator is a
higher-order function that accepts several parsers as input and returns a new
parser as its output."
The Parser
Let's say we have a document :
The James Bond series, by writer Ian Fleming, focuses on a fictional British Secret Service agent created in 1953, who featured him in twelve novels and two short-story collections. Since Fleming's death in 1964, eight other authors have written authorised Bond novels or novelizations: Kingsley Amis, Christopher Wood, John Gardner, Raymond Benson, Sebastian Faulks, Jeffery Deaver, William Boyd and Anthony Horowitz.
The parser could fetch every names, ie two consecutive words starting with uppercase.
The parser will read through the document and aggregate a Response,
which contains a value and the current offset in the text.
This value will evolve when the parser will meet new characters,
but also with some function calls, such as the map()
function.
The Response
By definition, a Parser takes text as an input, and the Response is a structure that represents your problem.
After parsing, there are two subtypes of Response
:
Accept
when it found something.Reject
if it could not.
let response = C.char('a').rep().parse(Streams.ofString('aaaa'));
assertEquals(response.value.join(''), 'aaaa' );
assertEquals(response.offset, 4 );
assertTrue(response.isAccepted());
assertTrue(response.isConsumed());
response = C.char('a').rep().parse(Streams.ofString('aabb'));
assertEquals(response.value.join(''), 'aa' );
assertEquals(response.offset, 2 );
assertTrue(response.isAccepted());
assertFalse(response.isConsumed());
Building the Parser, and execution
Like a language, the parser is built then executed. With Masala, we build using other parsers.
const helloParser = C.string('hello');
const white = C.char(' ');
const worldParser = C.char('world');
const combinator = helloParser.then(white.rep()).then(worldParser);
There is a compiling time when you combine your parser, and an execution time when the parser
runs its parse(stream)
function. You will have the Response
after parsing.
So after building, the parser is executed against a stream of token.
For simplicity, we will use a stream of characters, which is a text :)
Hello Gandhi
The goal is check that we have Hello 'someone', then to grab that name
const {Streams, C}= require('@masala/parser');
var helloParser = C.string("Hello")
.then(C.char(' ').rep())
.then(C.letters())
.last();
var value = helloParser.val("Hello Gandhi");
assertEquals('Gandhi', value);
Parser Combinations
Let's use a real example. We combine many functions that returns a new Parser. And each new Parser
is a combination of Parsers given by the standard bundles or previous functions.
import {Streams, N,C, F} from '@masala/parser';
const blanks = ()=>C.char(' ').optrep();
function operator(symbol) {
return blanks().drop()
.then(C.char(symbol))
.then(blanks().drop())
.single();
}
function sum() {
return N.integer()
.then(operator('+').drop())
.then(N.integer())
.map(tuple => tuple.at(0) + tuple.at(1));
}
function multiplication() {
return N.integer()
.then(operator('*').drop())
.then(N.integer())
.array()
.map( ([left,right])=> left * right);
}
function scalar() {
return N.integer();
}
function combinator() {
return F.try(sum())
.or(F.try(multiplication()))
.or(scalar());
}
function parseOperation(line) {
return combinator().parse(Streams.ofString(line));
}
assertEquals(4, parseOperation('2 +2').value, 'sum: ');
assertEquals(6, parseOperation('2 * 3').value, 'multiplication: ');
assertEquals(8, parseOperation('8').value, 'scalar: ');
A curry paste is an higher order ingredient made from a good combination of spices.
Precedence
Precedence is a technical term for priority. Using:
function combinator() {
return F.try(sum())
.or(F.try(multiplication()))
.or(scalar());
}
console.info('sum: ',parseOperation('2+2').value);
We will give priority to sum, then multiplication, then scalar. If we had put scalar()
first, we would have first
accepted 2
, then what could we do with +2
alone ? It's not a valid sum ! Moreover +2
and -2
are acceptable scalars.
try(x).or(y)
or()
will often be used with try()
, that makes backtracking
: it saves the current offset, then tries an option. And as soon that it's not satisfied, it goes back to the original
offset and use the parser inside the .or(P)
expression.`.
Like Haskell's Parsec, Masala Parser can parse infinite look-ahead grammars but
performs best on predictive (LL[1]) grammars.
Let see how with try()
, we can look a bit ahead of next characters, then go back:
F.try(sum()).or(F.try(multiplication())).or(scalar())
// try(sum()) parser in action
2 *2
..ok..ok ↑oups: go back and try multiplication. Should be OK.
Suppose we do not try()
but use or()
directly:
sum().or(multiplication()).or(scalar())
// testing sum()
2 *2
..ok..ok ↑oups: cursor is NOT going back. So now we must test '*2' ;
Is it (multiplication())? No ;
or(scalar()) ? neither
Recursion
Masala-Parser (like Parsec) is a top-down parser and doesn't like Left Recursion.
However, it is a resolved problem for this kind of parsers, with a lot of documentation. You can read more on recursion
with Masala, and checkout examples on our Github repository
( simple recursion,
or calculous expressions ).
Simple documentation of Core bundles
Core Parser Functions
Here is a link for Core functions documentation.
It will explain then()
, drop()
, map()
, rep()
, opt()
and other core functions of the Parser
with code examples.
The Chars Bundle
Example:
C.char('-')
.then(C.letters())
.then(C.char('-'))
General use
letter()
: accept a european letter (and moves the cursor)letters()
: accepts many letters and returns a stringletterAs(symbol)
: accepts a european(default), ascii, or utf8 Letter. More herelettersAs(symbol)
: accepts many letters and returns a stringemoji()
: accept any emoji sequence. Opened Issue.notChar(x)
: accept if next input is not x
char(x)
: accept if next input is x
charIn('xyz')
: accept if next input is x
, y
or z
charNotIn('xyz')
: accept if next input is not x
, y
or z
subString(length)
: accept any next length characters and returns the equivalent stringstring(word)
: accept if next input is the given word
stringIn(words)
: accept if next input is the given words
More herenotString(word)
: accept if next input is not the given word
charLiteral()
: single quoted char element in C/Java : 'a'
is acceptedstringLiteral()
: double quoted string element in java/json: "hello world"
is acceptedlowerCase()
: accept any next lower case inputsupperCase()
: accept any next uppercase inputs
Other example:
C.string('Hello')
.then(C.char(' '))
.then(C.lowerCase().rep().join(''))
The Numbers Bundle
number()
: accept any float number, such as -2.3E+24, and returns a floatdigit()
: accept any single digit, and returns a numberdigits()
: accept many digits, and returns a number. Warning: it does not accept +- signs symbols.integer()
: accept any positive or negative integer
The Flow Bundle
The flow bundle will mix ingredients together.
For example if you have a Parser p
, F.not(p)
will accept anything
that does not satisfy p
All of these functions will return a brand new Parser that you can combine with others.
Most important:
F.try(parser).or(otherParser)
: Try a parser and come back to otherParser
if failedF.any()
: Accept any character (and so moves the cursor)F.not(parser)
: Accept anything that is not a parser. Often used to accept until a given stopF.eos()
: Accepted if the Parser has reached the End Of StreamF.moveUntil(string|stopParser)
: Alternative for regex. Will traverse the document until the stop parser
- returns
undefined
if stop is not found - returns all characters if stop is found, and set the cursor at the spot of the stop
F.dropTo(string|stopParser)
: Will traverse the document including the stop parser
Others:
F.lazy(parser, ?params)
: Makes a lazy evaluation. May be used for Left recursion (difficult)F.parse(parserFunction)
: Create a new Parser from a function. Usually, you won't start here.F.subStream(length)
: accept any next charactersF.returns(value)
: forces a returned valueF.error()
: returns an error. Parser will never be acceptedF.satisfy(predicate)
: check if condition is satisfiedF.startsWith(value)
: create a no-op parser with initial value
License
Copyright (C)2016-2019 D. Plaindoux.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as
published by the Free Software Foundation; either version 2, or (at
your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this program; see the file COPYING. If not, write
to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
USA.