github.com/hydralang/ptk

Package ptk is a toolkit for creating language parsers using a top-down recursive descent technique first described by Vaughan Pratt. This technique is based on the concept of creating the functions around token types, rather than around productions, and yields a more efficient recursive descent parser that doesn't require recursing as deeply. The Pratt technique is particularly effective for parsing expressions, where operator precedence is important; instead of a different function for each operator class, the Pratt technique enables handling the situation with a single function. Creating a parser using ptk involves three main components--a scanner, a lexer, and a parser--each represented by a different subpackage. The scanner takes a source representation and breaks it down into single characters that are paired their location within the source. The lexer then takes these characters and, using user-provided classifiers and recognizers, groups them into tokens; a token is comprised of a token type, a location, and an optional semantic meaning (for instance, the characters "12345" would likely have a semantic meaning consisting of the numerical value represented by that character sequence). Finally, the parser takes a sequence of tokens and, with a user-provided parse table, constructs an abstract syntax tree. This AST is the final result of the parsing, and the caller may process it in any necessary fashion. The scanner subpackage contains two main scanners--a FileScanner, which actually takes an io.Reader instance, and an argument scanner, which takes a list of character strings and treats them as a single source separated by an optional separator that defaults to whitespace. In addition, there are several composite scanners, such as the "memoizing" scanner and the chaining scanner; the argument scanner is actually built by composing these composite scanners around the arguments. A ListScanner is also provided to simplify testing code. This subpackage also contains the Location interface and types, which represent a range of the source containing a single character or group of characters. The lexer subpackage contains a Lexer type, which is initialized by passing in a user-created scanner and a lexer.State; the state is used by the lexer to find a Classifier. The lexer calls the Classifier.Classify method to get a list of Recognizer instances that may be able to handle the input, which the lexer then calls in turn until one of the recognizers successfully recognizes the input. Recognizers call the Lexer.Push method to register any tokens they extract from the input; but a recognizer need not push any tokens. (For instance, in most languages, whitespace is simply discarded, so a recognizer for whitespace would not produce any tokens.) A lexer.BaseState type is provided that has the basic functionality, but applications are free to extend or reimplement the functionality. The parser subpackage contains a Parser type, which is initialized by passing in a user-created lexer and a parser.State; the state is used by the parser to find a Table, which maps token types produced by the lexer to an Entry, which itself contains an integer binding Power, along with three callback functions related to three contexts in which a token may appear. As with the lexer, a default simple implementation of parser.State is provided, named parser.BaseState; applications may extend or reimplement the functionality as needed. The parser subpackage also contains some utilities for building an AST; specifically, the UnaryOperator and BinaryOperator types are implementations of parser.Node, and the Prefix, Infix, and InfixR functions can be used to construct the callback functions for some common casses, while Literal is a parser.ExprFirst function for literals. A fourth subpackage is provided: the parser/visualize subpackage contains the Visualize function, which allows rendering an abstract syntax tree into a visual representation. This may be used while developing a parser to ensure that it is producing the correct AST for a given input.

v0.5.0

Source

Version published: 4 years ago

Readme

Source

Parser Tool Kit

This repository contains a framework for building parsers using the Pratt technique. The Pratt technique is a recursive descent parsing technique specifically geared toward enabling more efficient parsing of grammars involving operator precedence, such as expression grammars. The repository also contains other functionality to assist in the creation of lexical analyzers and other code useful for parsers.

FAQs

What is github.com/hydralang/ptk?

Last updated on 31 Oct 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/hydralang/ptk

Parser Tool Kit

Related posts

TC39 Advances Key Proposals: Deferred Import Evaluation, Error.isError(), RegExp Escaping, Promise.try

New Research Shows Teams of LLM Agents Can Autonomously Exploit Zero-Day Vulnerabilities with 53% Success Rate