yy
Command yy processes yacc source code and produces three output files:
- A Go file containing definitions of AST nodes.
- A Go file containing documentation examples[0] of productions defined by the yacc grammar.
- A new yacc file with automatic actions instantiating the AST nodes.
Installation
To install yy
$ go get [-u] modernc.org/yy
Online documentation
godoc.org/modernc.org/yy
Usage
Invocation
$ yy [options] <input.y>
Options
Flags handled by the yy command:
-ast string
Output AST nodes definitions. (default "ast.go")
-astExamples string
Output AST examples. (default "ast_test.go")
-astImport string
Optional AST file imports.
-exampleAST string
Fuction to call to produce example ASTs. (default "exampleAST")
-f
Try to continue on errors. May crash.
-forceOptPos
Assume all nodes are optional when resolving positions
-kind string
Default node kind (rule case) field name. (default "kind")
-node string
Default non terminal yacc type. (default "node")
-noListKind
Do not generate kind (rule case) field for lists.
-noPrivateHelpers
Do not generate unexported helper methods.
-o string
Output yacc file. (default "parser.y")
-pkg string
Package name of generated Go files. Extract from input when blank.
-position
Assume node implements token.Position instead of token.Pos
-prettyString string
Fuction to stringify things nicely. (default "prettyString")
-token string
Default terminal yacc type. (default "Token")
-tokenSep string
AST examples token separator string. (default " ")
-v string
create grammar report (default "y.output")
-yylex string
Type of yacc's yylex. (default "*lexer")
Changelog
2019-02-27: Added -position flag.
2017-10-23: Added the case directive.
Examples
A partial example: see the testdata directory and files
input: in.y
output: ast.go
output: ast_test.go
output: out.y
The three output files were generated by
yy -o testdata/out.y -ast testdata/ast.go -astExamples testdata/ast_test.go testdata/in.y
A more complete, working project using yy can be found at http://godoc.org/modernc.org/pl0
Concepts
Every rule is turned into a definition of a struct type in ast.go (adjust using the -ast flag). The fields of the type are a sum of all productions (cases) of the rule.
Rule:
Foo Bar // Case 0
| Foo Baz // Case 1
The generated type will be something like
type Rule struct {
Case in // In [0, 1].
Bar *Bar
Baz *Baz
Foo *Foo
}
In the above, Foo and Bar fields will be non nill when Case is 0 and Foo and Baz fields will be non nil when Case is 1.
The above holds when both Foo and Bar are non terminal symbols. If the production(s) contain also terminal symbols, all those symbols are turned into fields named Token with an optional numeric suffix when more than one non terminal appears in any of the production(s).
Rule:
Foo '+' Bar
| Foo '[' NUMBER ']' Bar
The generated type will be like
type Rule struct {
Case int // In [0, 1].
Bar *Bar
Baz *Baz
Foo *Foo
Token MyTokenType
Token2 MyTokenType
Token3 MyTokenType
}
In the above, Token will capture '+' when Case is 0. For Case 1, Token will capture '[', Token2 NUMBER and Token3 ']'.
MyTokenType is the type defined in the yacc %union as in
%union {
node MyNodeType
Token MyTokenType
}
It is assumed that the lexer passed as an argument to yyParse instantiantes the lval.Token field with additional token information, like the lexeme value, starting position in the file etc.
Generated actions
There's a direct mapping, though not in the same order, of yacc pseudo variables $1, $2, ... and fields of the generated node types. For every production not disabled by the yy:ignore direction, yy injects code for instantiating the AST node when the production is reduced. For example, this rule from input.y
File:
Prologue TopLevelDeclList
having no semantic action is turned into
File:
Prologue TopLevelDeclList
{
$$ = &File{
Prologue: $1.(*Prologue),
TopLevelDeclList: $2.(*TopLevelDeclList).reverse(),
}
}
in output.y. The default yacc type of AST nodes is 'node' and can be changed using the -node flag.
Conventions
Option-like rules, for example as in
BlockOpt:
| Block
are converted into
BlockOpt:
/* empty */
{
$$ = (*BlockOpt)(nil)
}
| Block
{
$$ = &BlockOpt{
Block: $1.(*Block),
}
}
in output.y, ie. the empty case does not produce a &RuleOpt{}, but nil instead to conserve space.
Generated examples depend on an user supplied function, by default named exampleAST, with a signature
exampleAST(rule int, src string) interface{}
This function is called with the production number, as assigned by goyacc and an example string generated by yy. exampleAST should parse the example string and return the AST created when production rule is reduced.
When the project's parser is not yet working, a dummy exampleAST function returnin always nil is a workaround.
Magic names
yy inspects rule actions found in the input file. If the action code mentions identifier lx, yy asumes it refers to the yyLexer passed to yyParse. In that case code like
lx := yylex.(*lexer)
is injected near the beginning of the semantic action. The specific type into which the yylex parameter is type asserted is adjustable using the -yylex flag. Similarly, when identifier lhs is mentioned, a short variable definiton of variable lhs, like
lhs := &Foo{...}
$$ = lhs
is injected into the output.y action, replacing the default generated action (see "Concepts")
For example, an action in input.y
| IdentifierList Type '=' ExpressionList
{
lhs.declare(lx.scope)
}
Produces
{
lx := yylex.(*lexer)
lhs := &VarSpec{
Case: 2,
IdentifierList: $1.(*IdentifierList).reverse(),
Type: $2.(*Type),
Token: $3,
ExpressionList: $4.(*ExpressionList).reverse(),
}
$$ = lhs
lhs.declare(lx.scope)
}
in output.y.
The AST examples generator depends on presence of the yy:token directive for all non constant terminal symbols or the presence of the constant token value as in this example
%token /*yy:token "%c" */ IDENTIFIER "identifier"
%token BREAK "break"
Using fe
The AST examples yy generates must be post processed by using the fe command (http://godoc.org/modernc.org/fe), for example
$ go test -run ^Example[^_] | fe
One of the reasons why this is not done automatically by yy is that the above command will succeed only after your project has a working scanner/parser combination. That's not the case in the early stages.
Directives
yy recognizes specially formatted comments within the input as directives. All directive have the format
//yy:command argument
or
/*yy:command argument */
Note that the directive must follow immediately the comment opening. There must be no empty line(s) between the directive and the production it aplies to.
Directive example
For example
//yy:example "foo * bar"
Rule:
Foo '*' Bar
//yy:example "foo / bar"
| Foo '/' Bar
The argument of the example directive is a doubly quoted Go string. The string is used instead of an automatically generated example.
Directive field
For example
//yy:field count int
//yy:field flag bool
Rule: Foo Bar
The argument of the field directive is the text up to the end of the comment. The argument is added to the automatically generated fields of the node type of Rule.
Directive ignore
For example
//yy:ignore
Rule: Foo Bar
The ignore directive has no arguments. The directive disables generating of the node type of Rule as well as generating code instantiating such node.
Directive list
For example
//yy:list
Rule:
Item
| Rule ',' Item
The list directive has no arguments. yy by default detects all left recursive rules. When such rule has name having suffix 'List', yy automatically generates proper reversing of the rule items. Using the list directive enables the same when such a left recursive rule does not have suffix 'List' in its name.
Directive token
For example
/*yy:token %c*/ IDENT
/*yy:token %d*/ NUMBER
The argument of the token directive is a doubly quoted Go string. The string is passed to a fmt.Sprinf call with an numeric argument chosen by yy that falls small ASCII letters. The resulting string is used to generate textual token values in examples.
Directive case
For example
//yy:case Foo
/*yy:case Bar */ NUMBER
The argument of the case directive is an identifier, which is appended to the rule name to produce a symbolic and typed case number value. The type name is Case.
Links
Referenced from elsewhere:
[0]: https://golang.org/pkg/testing/#hdr-Examples