
Security News
Axios Maintainer Confirms Social Engineering Attack Behind npm Compromise
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.
github.com/go-sqlparser/participle
Advanced tools
The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.
Participle's method of defining grammars should be familiar to any Go
programmer who has used the encoding/json package: struct field tags define
what and how input is mapped to those same fields. This is not unusual for Go
encoders, but is unusual for a parser.
Participle grammars are LL(k). Among other things, this means that they do not support left recursion.
The default value of K is 1 but this can be controlled with participle.UseLookahead(k).
Left recursion must be eliminated by restructuring your grammar.
A tutorial is available, walking through the creation of an .ini parser.
A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.
type INI struct {
Properties []*Property `{ @@ }`
Sections []*Section `{ @@ }`
}
type Section struct {
Identifier string `"[" @Ident "]"`
Properties []*Property `{ @@ }`
}
type Property struct {
Key string `@Ident "="`
Value *Value `@@`
}
type Value struct {
String *string ` @String`
Number *float64 `| @Float`
}
Note: Participle also supports named struct tags (eg.
Hello string `parser:"@Ident"`).
A parser is constructed from a grammar and a lexer:
parser, err := participle.Build(&INI{})
Once constructed, the parser is applied to input to produce an AST:
ast := &INI{}
err := parser.ParseString("size = 10", ast)
// ast == &INI{
// Properties: []*Property{
// {Key: "size", Value: &Value{Number: &10}},
// },
// }
@<expr> Capture expression into the field.@@ Recursively capture using the fields own type.<identifier> Match named lexer token.( ... ) Group."..." Match the literal (note that the lexer must emit tokens matching this literal exactly)."...":<identifier> Match the literal, specifying the exact lexer token type to match.<expr> <expr> ... Match expressions.<expr> | <expr> Match one of the alternatives.!<expr> Match any token that is not the start of the expression (eg: @!";" matches anything but the ; character into the field).The following modifiers can be used after any expression:
* Expression can match zero or more times.+ Expression must match one or more times.? Expression can match zero or once.! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).Supported but deprecated:
{ ... } Match 0 or more times (DEPRECATED - prefer ( ... )*).[ ... ] Optional (DEPRECATED - prefer ( ... )?).Notes:
@<expr> is the mechanism for capturing matches into the field.Prefixing any expression in the grammar with @ will capture matching values
for that expression into the corresponding field.
For example:
// The grammar definition.
type Grammar struct {
Hello string `@Ident`
}
// The source text to parse.
source := "world"
// After parsing, the resulting AST.
result == &Grammar{
Hello: "world",
}
For slice and string fields, each instance of @ will accumulate into the
field (including repeated patterns). Accumulation into other types is not
supported.
A successful capture match into a boolean field will set the field to true.
For integer and floating point types, a successful capture will be parsed
with strconv.ParseInt() and strconv.ParseBool() respectively.
Custom control of how values are captured into fields can be achieved by a
field type implementing the Capture interface (Capture(values []string) error).
Additionally, any field implementing the encoding.TextUnmarshaler interface
will be capturable too. One caveat is that UnmarshalText() will be called once
for each captured token, so eg. @(Ident Ident Ident) will be called three times.
Participle supports streaming parsing. Simply pass a channel of your grammar into
Parse*(). The grammar will be repeatedly parsed and sent to the channel. Note that
the Parse*() call will not return until parsing completes, so it should generally be
started in a goroutine.
type token struct {
Str string ` @Ident`
Num int `| @Int`
}
parser, err := participle.Build(&token{})
tokens := make(chan *token, 128)
err := parser.ParseString(`hello 10 11 12 world`, tokens)
for token := range tokens {
fmt.Printf("%#v\n", token)
}
Participle operates on tokens and thus relies on a lexer to convert character streams to tokens.
Four lexers are provided, varying in speed and flexibility. Configure your parser with a lexer
via participle.Lexer().
The best combination of speed, flexibility and usability is lexer/regex.New().
Ordered by speed they are:
lexer.DefaultDefinition is based on the
text/scanner package and only allows
tokens provided by that package. This is the default lexer.lexer.Regexp() (legacy) maps regular expression named subgroups to lexer symbols.lexer/regex.New() is a more readable regex lexer, with each rule in the form <name> = <regex>.lexer/ebnf.New() is a lexer based on the Go EBNF package. It has a large potential for optimisation
through code generation, but that is not implemented yet.To use your own Lexer you will need to implement two interfaces: Definition and Lexer.
The Parser's behaviour can be configured via Options.
There are several examples included:
| Example | Description |
|---|---|
| BASIC | A lexer, parser and interpreter for a rudimentary dialect of BASIC. |
| EBNF | Parser for the form of EBNF used by Go. |
| Expr | A basic mathematical expression parser and evaluator. |
| GraphQL | Lexer+parser for GraphQL schemas |
| HCL | A parser for the HashiCorp Configuration Language. |
| INI | An INI file parser. |
| Protobuf | A full Protobuf version 2 and 3 parser. |
| SQL | A very rudimentary SQL SELECT parser. |
| Thrift | A full Thrift parser. |
| TOML | A TOML parser. |
Included below is a full GraphQL lexer and parser:
package main
import (
"os"
"github.com/alecthomas/kong"
"github.com/alecthomas/repr"
"github.com/alecthomas/participle"
"github.com/alecthomas/participle/lexer"
"github.com/alecthomas/participle/lexer/ebnf"
)
type File struct {
Entries []*Entry `@@*`
}
type Entry struct {
Type *Type ` @@`
Schema *Schema `| @@`
Enum *Enum `| @@`
Scalar string `| "scalar" @Ident`
}
type Enum struct {
Name string `"enum" @Ident`
Cases []string `"{" @Ident* "}"`
}
type Schema struct {
Fields []*Field `"schema" "{" @@* "}"`
}
type Type struct {
Name string `"type" @Ident`
Implements string `("implements" @Ident)?`
Fields []*Field `"{" @@* "}"`
}
type Field struct {
Name string `@Ident`
Arguments []*Argument `("(" (@@ ("," @@)*)? ")")?`
Type *TypeRef `":" @@`
Annotation string `("@" @Ident)?`
}
type Argument struct {
Name string `@Ident`
Type *TypeRef `":" @@`
Default *Value `("=" @@)?`
}
type TypeRef struct {
Array *TypeRef `( "[" @@ "]"`
Type string ` | @Ident )`
NonNullable bool `@"!"?`
}
type Value struct {
Symbol string `@Ident`
}
var (
graphQLLexer = lexer.Must(ebnf.New(`
Comment = ("#" | "//") { "\u0000"…"\uffff"-"\n" } .
Ident = (alpha | "_") { "_" | alpha | digit } .
Number = ("." | digit) {"." | digit} .
Whitespace = " " | "\t" | "\n" | "\r" .
Punct = "!"…"/" | ":"…"@" | "["…`+"\"`\""+` | "{"…"~" .
alpha = "a"…"z" | "A"…"Z" .
digit = "0"…"9" .
`))
parser = participle.MustBuild(&File{},
participle.Lexer(graphQLLexer),
participle.Elide("Comment", "Whitespace"),
)
cli struct {
Files []string `arg:"" type:"existingfile" required:"" help:"GraphQL schema files to parse."`
}
)
func main() {
ctx := kong.Parse(&cli)
for _, file := range cli.Files {
ast := &File{}
r, err := os.Open(file)
ctx.FatalIfErrorf(err)
err = parser.Parse(r, ast)
r.Close()
repr.Println(ast)
ctx.FatalIfErrorf(err)
}
}
One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.
You can run the benchmarks yourself, but here's the output on my machine:
BenchmarkParticipleThrift-4 10000 221818 ns/op 48880 B/op 1240 allocs/op
BenchmarkGoThriftParser-4 2000 804709 ns/op 170301 B/op 3086 allocs/op
On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.
A compiled Parser instance can be used concurrently. A LexerDefinition can be used concurrently. A Lexer instance cannot be used concurrently.
There are a few areas where Participle can provide useful feedback to users of your parser.
io.Reader includes a Name() string method (as os.File does), the filename will be included.Pos lexer.Position or Tok lexer.Token will be automatically
populated from the nearest matching token.EndPos lexer.Position or EndTok lexer.Token will be
automatically populated with the token at the end of the node.These related pieces of information can be combined to provide fairly comprehensive error reporting.
Participle supports outputting an EBNF grammar from a Participle parser. Once
the parser is constructed simply call String().
eg. The GraphQL example gives in the following EBNF:
File = Entry* .
Entry = Type | Schema | Enum | "scalar" ident .
Type = "type" ident ("implements" ident)? "{" Field* "}" .
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
Argument = ident ":" TypeRef ("=" Value)? .
TypeRef = "[" TypeRef "]" | ident "!"? .
Value = ident .
Schema = "schema" "{" Field* "}" .
Enum = "enum" ident "{" ident* "}" .
FAQs
Unknown package
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.