Package unfurlist implements a service that unfurls URLs and provides more information about them. The current version supports Open Graph and oEmbed formats, Twitter card format is also planned. If the URL does not support common formats, unfurlist falls back to looking at common HTML tags such as <title> and <meta name="description">. The endpoint accepts GET and POST requests with `content` as the main argument. It then returns a JSON encoded list of URLs that were parsed. If an URL lacks an attribute (e.g. `image`) then this attribute will be omitted from the result. Example: Will return: If handler was configured with FetchImageSize=true in its config, each hash may have additional fields `image_width` and `image_height` specifying dimensions of image provided by `image` attribute. Additionally you can supply `callback` to wrap the result in a JavaScript callback (JSONP), the type of this response would be "application/x-javascript" If an optional `markdown` boolean argument is set (markdown=true), then provided content is parsed as markdown formatted text and links are extracted in context-aware mode — i.e. preformatted text blocks are skipped. Care should be taken when running this inside internal network since it may disclose internal endpoints. It is a good idea to run the service on a separate host in an isolated subnet. Alternatively access to internal resources may be limited with firewall rules, i.e. if service is running as 'unfurlist' user on linux box, the following iptables rules can reduce chances of it connecting to internal endpoints (note this example is for ipv4 only!):
Package pm is a process manager with an HTTP monitoring/control interface. To this package, processes or tasks are user-defined routines within a running program. This model is particularly suitable to server programs, with independent client-generated requests for action. The library would keep track of all such tasks and make the information available through HTTP, thus providing valuable insight into the running application. The client can also ask for a particular task to be canceled. Using pm starts by calling ListenAndServe() in a separate routine, like so: Note that pm's ListenAndServe() will not return by itself. Nevertheless, it will if the underlying net/http's ListenAndServe() does. So it's probably a good idea to wrap that call with some error checking and retrying for production code. Tasks to be tracked must be declared with Start(); that's when the identifier gets linked to them. An optional set of (arbitrary) attributes may be provided, that will get attached to the running task and be reported to the HTTP client tool as well. (Think of host, method and URI for a web server, for instance.) This could go like this: Note the deferred pm.Done() call. That's essential to mark the task as finished and release resources, and has to be called no matter if the task succeeded or not. That's why it's a good idea to use Go's defer mechanism. In fact, the protection from cancel-related panics (see StopCancelPanic below) does NOT work if Done() is called in-line (i.e., not deferred) due to properties of recover(). An HTTP client issuing a GET call to /procs/ would receive a JSON response with a snapshot of all currently running tasks. Each one will include the time when it was started (as per the server clock) as well as the complete set of attributes provided to Start(). Note that to avoid issues with clock skews among servers, the current time for the server is returned as well. The reply to the /procs/ GET also includes a status. When tasks start they are set to "init", but they may change their status using pm's Status() function. Each call will record the change and the HTTP client will receive the last status available, together with the time it was set. Furthermore, the client may GET /procs/<id>/history instead (where <id> is the task identifier) to have a complete history for the given task, including all status changes with their time information. However, note that task information is completely recycled as soon as a task is Done(). Hence, client applications should be prepared to receive a not found reply even if they've just seen the task in a /procs/ GET result. Given the lack of a statement in Go to kill a routine, cancellations are implemented as panics. A DELETE call to /procs/<id> will mark the task with the given identifier as cancel-pending. Nevertheless, the task will never notice until it reaches another Status() call, which is by definition a cancellation point. Calls to Status() either return having set the new status, or panic with an error of type CancelErr. Needless to say, the application should handle this gracefully or will otherwise crash. Programs serving multiple requests will probably be already protected, with a recover() call at the level where the routine was started. But if that's not the case, or if you simply want the panic to be handled transparently, you may use this: When the StopCancelPanic option is set (which is NOT the default) Done() will recover a panic due to a cancel operation. In such a case, the routine running that code will jump to the next statement after the invocation to the function that called Start(). (Read it again.) In other words, stack unfolding stops at the level where Done() was deferred. Notice, though, that this behavior is specifically tailored to panics raising from pending cancellation requests. Should any other panic arise for any reason, it will continue past Done() as usual. So will panics due to cancel requests if StopCancelPanic is not set. Options set with pm.SetOptions() work on a global scope for the process list. Alternatively, you may provide a specific set for some task by using the options argument in the Start() call. If none is given, pm will grab the current global values at the time Start() was invoked. Task-specific options may come handy when, for instance, some tasks should be resilient to cancellation. Setting the ForbidCancel option makes the task effectively immune to cancel requests. Attempts by clients to cancel one of such tasks will inevitably yield an error message with no other effects. Set that with the global SetOptions() function and none of your tasks will cancel, ever. HTTP clients can learn about pending cancellation requests. Furthermore, if a client request happens to be handled between the task is done/canceled and resource recycling (a VERY tiny time window), then the result would include one of these as the status: "killed", "aborted" or "ended", if it was respectively canceled, died out of another panic (not pm-related) or finished successfully. The "killed" status may optionally add a user-defined message, provided through the HTTP /procs/<id> DELETE method. For the cancellation feature to be useful, applications should collaborate. Go lacks a mechanism to cancel arbitrary routines (it even lacks identifiers for them), so programs willing to provide the feature must be willing to help. It's a good practice to add cancellation points every once in a while, particularly when lengthy operations are run. However, applications are not expected to change status that frequently. This package provides the function CheckCancel() for that. It works as a cancellation point by definition, without messing with the task status, nor leaving a trace in history. Finally, please note that cancellation requests yield panics in the same routine that called Start() with that given identifier. However, it's not unusual for servers to spawn additional Go routines to handle the same request. The application is responsible of cleaning up, if there are additional resources that should be recycled. The proper way to do this is by catching CancelErr type panics, cleaning-up and then re-panic, i.e.:
Package lfucache implements an O(1) LFU (Least Frequenty Used) cache. **DEPRECATION WARNING** This package is unmaintained and probably not what you're looking for. This structure is described in the paper "An O(1) algorithm for implementing the LFU cache eviction scheme" by K. Shah, A. Mitra and D. Matani. It is based on two levels of doubly linked lists and gives O(1) insert, access and delete operations. It is implemented here with a few practical optimizations and extra features. The cache supports sending evicted items to interested listeners via channels, and manually evicting cache items matching a certain criteria. This is useful for example when using the package as a write cache for a database, where items must be written to the backing store on eviction. The cache structure is not thread safe. Example: --- Copyright (c) 2013 Jakob Borg. Licensed under the MIT license.
fifo_queue.Queue is a simple FIFO queue. It supports pushing an item at the end, and popping an item from the front. It is implemented as a single-linked list of arrays of items.
Package ora implements an Oracle database driver. ### Golang Oracle Database Driver ### #### TL;DR; just use it #### Call stored procedure with OUT parameters: An Oracle database may be accessed through the database/sql(http://golang.org/pkg/database/sql) package or through the ora package directly. database/sql offers connection pooling, thread safety, a consistent API to multiple database technologies and a common set of Go types. The ora package offers additional features including pointers, slices, nullable types, numerics of various sizes, Oracle-specific types, Go return type configuration, and Oracle abstractions such as environment, server and session. The ora package is written with the Oracle Call Interface (OCI) C-language libraries provided by Oracle. The OCI libraries are a standard for client application communication and driver communication with Oracle databases. The ora package has been verified to work with: * Oracle Standard 11g (11.2.0.4.0), Linux x86_64 (RHEL6) * Oracle Enterprise 12c (12.1.0.1.0), Windows 8.1 and AMD64. --- * [Installation](https://github.com/rana/ora#installation) * [Data Types](https://github.com/rana/ora#data-types) * [SQL Placeholder Syntax](https://github.com/rana/ora#sql-placeholder-syntax) * [Working With The Sql Package](https://github.com/rana/ora#working-with-the-sql-package) * [Working With The Oracle Package Directly](https://github.com/rana/ora#working-with-the-oracle-package-directly) * [Logging](https://github.com/rana/ora#logging) * [Test Database Setup](https://github.com/rana/ora#test-database-setup) * [Limitations](https://github.com/rana/ora#limitations) * [License](https://github.com/rana/ora#license) * [API Reference](http://godoc.org/github.com/rana/ora#pkg-index) * [Examples](./examples) --- Minimum requirements are Go 1.3 with CGO enabled, a GCC C compiler, and Oracle 11g (11.2.0.4.0) or Oracle Instant Client (11.2.0.4.0). Install Oracle or Oracle Instant Client. Copy the [oci8.pc](contrib/oci8.pc) from the `contrib` folder (or the one for your system, maybe tailored to your specific locations) to a folder in `$PKG_CONFIG_PATH` or a system folder, such as The ora package has no external Go dependencies and is available on GitHub and gopkg.in: *WARNING*: If you have Oracle Instant Client 11.2, you'll need to add "=lnnz11" to the list of linked libs! Otherwise, you may encounter "undefined reference to `nzosSCSP_SetCertSelectionParams' " errors. Oracle Instant Client 12.1 does not need this. The ora package supports all built-in Oracle data types. The supported Oracle built-in data types are NUMBER, BINARY_DOUBLE, BINARY_FLOAT, FLOAT, DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE, INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND, CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR2, LONG, CLOB, NCLOB, BLOB, LONG RAW, RAW, ROWID and BFILE. SYS_REFCURSOR is also supported. Oracle does not provide a built-in boolean type. Oracle provides a single-byte character type. A common practice is to define two single-byte characters which represent true and false. The ora package adopts this approach. The oracle package associates a Go bool value to a Go rune and sends and receives the rune to a CHAR(1 BYTE) column or CHAR(1 CHAR) column. The default false rune is zero '0'. The default true rune is one '1'. The bool rune association may be configured or disabled when directly using the ora package but not with the database/sql package. Within a SQL string a placeholder may be specified to indicate where a Go variable is placed. The SQL placeholder is an Oracle identifier, from 1 to 30 characters, prefixed with a colon (:). For example: Placeholders within a SQL statement are bound by position. The actual name is not used by the ora package driver e.g., placeholder names :c1, :1, or :xyz are treated equally. The `database/sql` package provides a LastInsertId method to return the last inserted row's id. Oracle does not provide such functionality, but if you append `... RETURNING col /*LastInsertId*/` to your SQL, then it will be presented as LastInsertId. Note that you have to mark with a `/*LastInsertId*/` (case insensitive) your `RETURNING` part, to allow ora to return the last column as `LastInsertId()`. That column must fit in `int64`, though! You may access an Oracle database through the database/sql package. The database/sql package offers a consistent API across different databases, connection pooling, thread safety and a set of common Go types. database/sql makes working with Oracle straight-forward. The ora package implements interfaces in the database/sql/driver package enabling database/sql to communicate with an Oracle database. Using database/sql ensures you never have to call the ora package directly. When using database/sql, the mapping between Go types and Oracle types may be changed slightly. The database/sql package has strict expectations on Go return types. The Go-to-Oracle type mapping for database/sql is: The "ora" driver is automatically registered for use with sql.Open, but you can call ora.SetCfg to set the used configuration options including statement configuration and Rset configuration. When configuring the driver for use with database/sql, keep in mind that database/sql has strict Go type-to-Oracle type mapping expectations. The ora package allows programming with pointers, slices, nullable types, numerics of various sizes, Oracle-specific types, Go return type configuration, and Oracle abstractions such as environment, server and session. When working with the ora package directly, the API is slightly different than database/sql. When using the ora package directly, the mapping between Go types and Oracle types may be changed. The Go-to-Oracle type mapping for the ora package is: An example of using the ora package directly: Pointers may be used to capture out-bound values from a SQL statement such as an insert or stored procedure call. For example, a numeric pointer captures an identity value: A string pointer captures an out parameter from a stored procedure: Slices may be used to insert multiple records with a single insert statement: The ora package provides nullable Go types to support DML operations such as insert and select. The nullable Go types provided by the ora package are Int64, Int32, Int16, Int8, Uint64, Uint32, Uint16, Uint8, Float64, Float32, Time, IntervalYM, IntervalDS, String, Bool, Binary and Bfile. For example, you may insert nullable Strings and select nullable Strings: The `Stmt.Prep` method is variadic accepting zero or more `GoColumnType` which define a Go return type for a select-list column. For example, a Prep call can be configured to return an int64 and a nullable Int64 from the same column: Go numerics of various sizes are supported in DML operations. The ora package supports int64, int32, int16, int8, uint64, uint32, uint16, uint8, float64 and float32. For example, you may insert a uint16 and select numerics of various sizes: If a non-nullable type is defined for a nullable column returning null, the Go type's zero value is returned. GoColumnTypes defined by the ora package are: When Stmt.Prep doesn't receive a GoColumnType, or receives an incorrect GoColumnType, the default value defined in RsetCfg is used. EnvCfg, SrvCfg, SesCfg, StmtCfg and RsetCfg are the main configuration structs. EnvCfg configures aspects of an Env. SrvCfg configures aspects of a Srv. SesCfg configures aspects of a Ses. StmtCfg configures aspects of a Stmt. RsetCfg configures aspects of Rset. StmtCfg and RsetCfg have the most options to configure. RsetCfg defines the default mapping between an Oracle select-list column and a Go type. StmtCfg may be set in an EnvCfg, SrvCfg, SesCfg and StmtCfg. RsetCfg may be set in a Stmt. EnvCfg.StmtCfg, SrvCfg.StmtCfg, SesCfg.StmtCfg may optionally be specified to configure a statement. If StmtCfg isn't specified default values are applied. EnvCfg.StmtCfg, SrvCfg.StmtCfg, SesCfg.StmtCfg cascade to new descendent structs. When ora.OpenEnv() is called a specified EnvCfg is used or a default EnvCfg is created. Creating a Srv with env.OpenSrv() will use SrvCfg.StmtCfg if it is specified; otherwise, EnvCfg.StmtCfg is copied by value to SrvCfg.StmtCfg. Creating a Ses with srv.OpenSes() will use SesCfg.StmtCfg if it is specified; otherwise, SrvCfg.StmtCfg is copied by value to SesCfg.StmtCfg. Creating a Stmt with ses.Prep() will use SesCfg.StmtCfg if it is specified; otherwise, a new StmtCfg with default values is set on the Stmt. Call Stmt.Cfg() to change a Stmt's configuration. An Env may contain multiple Srv. A Srv may contain multiple Ses. A Ses may contain multiple Stmt. A Stmt may contain multiple Rset. Setting a RsetCfg on a StmtCfg does not cascade through descendent structs. Configuration of Stmt.Cfg takes effect prior to calls to Stmt.Exe and Stmt.Qry; consequently, any updates to Stmt.Cfg after a call to Stmt.Exe or Stmt.Qry are not observed. One configuration scenario may be to set a server's select statements to return nullable Go types by default: Another scenario may be to configure the runes mapped to bool values: Oracle-specific types offered by the ora package are ora.Rset, ora.IntervalYM, ora.IntervalDS, ora.Raw, ora.Lob and ora.Bfile. ora.Rset represents an Oracle SYS_REFCURSOR. ora.IntervalYM represents an Oracle INTERVAL YEAR TO MONTH. ora.IntervalDS represents an Oracle INTERVAL DAY TO SECOND. ora.Raw represents an Oracle RAW or LONG RAW. ora.Lob may represent an Oracle BLOB or Oracle CLOB. And ora.Bfile represents an Oracle BFILE. ROWID columns are returned as strings and don't have a unique Go type. #### LOBs The default for SELECTing [BC]LOB columns is a safe Bin or S, which means all the contents of the LOB is slurped into memory and returned as a []byte or string. The DefaultLOBFetchLen says LOBs are prefetched only a minimal way, to minimize extra memory usage - you can override this using `stmt.SetCfg(stmt.Cfg().SetLOBFetchLen(100))`. If you want more control, you can use ora.L in Prep, Qry or `ses.SetCfg(ses.Cfg().SetBlob(ora.L))`. But keep in mind that Oracle restricts the use of LOBs: it is forbidden to do ANYTHING while reading the LOB! No another query, no exec, no close of the Rset - even *advance* to the next record in the result set is forbidden! Failing to adhere these rules results in "Invalid handle" and ORA-03127 errors. You cannot start reading another LOB till you haven't finished reading the previous LOB, not even in the same row! Failing this results in ORA-24804! For examples, see [z_lob_test.go](z_lob_test.go). #### Rset Rset is used to obtain Go values from a SQL select statement. Methods Rset.Next, Rset.NextRow, and Rset.Len are available. Fields Rset.Row, Rset.Err, Rset.Index, and Rset.ColumnNames are also available. The Next method attempts to load data from an Oracle buffer into Row, returning true when successful. When no data is available, or if an error occurs, Next returns false setting Row to nil. Any error in Next is assigned to Err. Calling Next increments Index and method Len returns the total number of rows processed. The NextRow method is convenient for returning a single row. NextRow calls Next and returns Row. ColumnNames returns the names of columns defined by the SQL select statement. Rset has two usages. Rset may be returned from Stmt.Qry when prepared with a SQL select statement: Or, *Rset may be passed to Stmt.Exe when prepared with a stored procedure accepting an OUT SYS_REFCURSOR parameter: Stored procedures with multiple OUT SYS_REFCURSOR parameters enable a single Exe call to obtain multiple Rsets: The types of values assigned to Row may be configured in StmtCfg.Rset. For configuration to take effect, assign StmtCfg.Rset prior to calling Stmt.Qry or Stmt.Exe. Rset prefetching may be controlled by StmtCfg.PrefetchRowCount and StmtCfg.PrefetchMemorySize. PrefetchRowCount works in coordination with PrefetchMemorySize. When PrefetchRowCount is set to zero only PrefetchMemorySize is used; otherwise, the minimum of PrefetchRowCount and PrefetchMemorySize is used. The default uses a PrefetchMemorySize of 134MB. Opening and closing Rsets is managed internally. Rset does not have an Open method or Close method. IntervalYM may be be inserted and selected: IntervalDS may be be inserted and selected: Transactions on an Oracle server are supported. DML statements auto-commit unless a transaction has started: Ses.PrepAndExe, Ses.PrepAndQry, Ses.Ins, Ses.Upd, and Ses.Sel are convenient one-line methods. Ses.PrepAndExe offers a convenient one-line call to Ses.Prep and Stmt.Exe. Ses.PrepAndQry offers a convenient one-line call to Ses.Prep and Stmt.Qry. Ses.Ins composes, prepares and executes a sql INSERT statement. Ses.Ins is useful when you have to create and maintain a simple INSERT statement with a long list of columns. As table columns are added and dropped over the lifetime of a table Ses.Ins is easy to read and revise. Ses.Upd composes, prepares and executes a sql UPDATE statement. Ses.Upd is useful when you have to create and maintain a simple UPDATE statement with a long list of columns. As table columns are added and dropped over the lifetime of a table Ses.Upd is easy to read and revise. Ses.Sel composes, prepares and queries a sql SELECT statement. Ses.Sel is useful when you have to create and maintain a simple SELECT statement with a long list of columns that have non-default GoColumnTypes. As table columns are added and dropped over the lifetime of a table Ses.Sel is easy to read and revise. The Ses.Ping method checks whether the client's connection to an Oracle server is valid. A call to Ping requires an open Ses. Ping will return a nil error when the connection is fine: The Srv.Version method is available to obtain the Oracle server version. A call to Version requires an open Ses: Further code examples are available in the [example file](https://github.com/rana/ora/blob/master/z_example_test.go), test files and [samples folder](https://github.com/rana/ora/tree/master/samples). The ora package provides a simple ora.Logger interface for logging. Logging is disabled by default. Specify one of three optional built-in logging packages to enable logging; or, use your own logging package. ora.Cfg().Log offers various options to enable or disable logging of specific ora driver methods. For example: To use the standard Go log package: which produces a sample log of: Messages are prefixed with 'ORA I' for information or 'ORA E' for an error. The log package is configured to write to os.Stderr by default. Use the ora/lg.Std type to configure an alternative io.Writer. To use the glog package: which produces a sample log of: To use the log15 package: which produces a sample log of: See https://github.com/rana/ora/tree/master/samples/lg15/main.go for sample code which uses the log15 package. Tests are available and require some setup. Setup varies depending on whether the Oracle server is configured as a container database or non-container database. It's simpler to setup a non-container database. An example for each setup is explained. Non-container test database setup steps: Container test database setup steps: Some helpful SQL maintenance statements: Run the tests. database/sql method Stmt.QueryRow is not supported. Go 1.6 introduced stricter cgo (call C from Go) rules, and introduced runtime checks. This is good, as the possibility of C code corrupting Go code is almost completely eliminated, but it also means a severe call overhead grow. [Sometimes](https://groups.google.com/forum/#!topic/golang-nuts/ccMkPG6Bi5k) this can be 22x the go 1.5.3 call time! So if you need performance more than correctness, start your programs with "GODEBUG=cgocheck=0" environment setting. Copyright 2017 Rana Ian, Tamás Gulácsi. All rights reserved. Use of this source code is governed by The MIT License found in the accompanying LICENSE file.
Package list implements a doubly linked list. To iterate over a list (where l is a *List):
Supplies API to append/fetch key/value/docid from kv-file. kv-file is opened and managed by the WStore structure. entry format is, Maximum size of each entry is int32, that is 2^31. translates btree blocks from persistant storage to in-memory data structure, called btree-node. A btree node can be a knode (also called leaf node) or it can be a inode. `block` structure is fundamental to both type of nodes. Btree indexing algorithm for json {key,docid,value} triplets. `keys` and `values` are expected to be in json, while `docid` is the primary key of json document which contains the key fragment. `value` can optionally be used to store fragment of a document. since keys generated for seconday indexes may not be unique, indexing a.k.a sorting is done on {key,docid}. This module provides a lock free interface to cache data structure. The general idea is to create an array of pointers, each pointing to the head of DCacheItem linked list. Since nodes are cached based on their file-position (fpos), fpos is hashed to index into the array. Refer indexFor() to know how. Subsequenly we walk the linked list for a match in fpos and corresponding cached node. For deleting we simply remove the node from the list. For inserting, we prepend the new DCacheItem as the head of the list, and walk the remaining list to delete the item, if it is already present. (fpos & hashmask) >> rshift defer goroutine that handles mvcc snapshots. Functionalites of this process augments the functionalit of MVCC process. Manages list of free blocks in btree index-file. Manages head sector of btree index-file. Head sector contains the following items, Index mutation due to {key,docid,value} insert. MVCC controller process. Handles all btree traversals except, insert() and remove() The following is the general idea on cache structure. commits*-----------| MVCC |<-* | recyles *------* | reclaims | | The cycle of ping-pong, reads will always refer to the pong-cache. reads will populate the cache from disk, when ever cache lookup fails. writes will refer to the commitQ maintained by MVCC, if node is not in commitQ it will refer to pong-cache. ping-cache is operated only by the MVCC controller. MVCC controller will _populate_ the ping-cache when new nodes are generated due to index mutations. MVCC controller will _evict_ the pong-cache as and when nodes become stale due to index mutations. ping2Pong() happens when snapshot is flushed to disk. pong becomes ping, and MVCC controller will _populate_ and _evict_ the newly flipped ping-cache based on commited, recycled and reclaimed node, before allowing further mutations. Data store for btree, organised in two files, index-file and kv-file. index-file, kv-file, Common functions used across test cases. Contains necessary functions to do index writing.
Package singlyLinkedList implements a singly linked list Package singlyLinkedList implements a singly linked list
Package deque provides a fast ring-buffer deque (double-ended queue) implementation. Deque generalizes a queue and a stack, to efficiently add and remove items at either end with O(1) performance. Queue (FIFO) operations are supported using PushBack() and PopFront(). Stack (LIFO) operations are supported using PushBack() and PopBack(). The ring-buffer automatically resizes by powers of two, growing when additional capacity is needed and shrinking when only a quarter of the capacity is used, and uses bitwise arithmetic for all calculations. The ring-buffer implementation significantly improves memory and time performance with fewer GC pauses, compared to implementations based on slices and linked lists. For maximum speed, this deque implementation leaves concurrency safety up to the application to provide, however the application chooses, if needed at all. Since it is OK for the deque to contain a nil value, it is necessary to either panic or return a second boolean value to indicate the deque is empty, when reading or removing an element. This deque panics when reading from an empty deque. This is a run-time check to help catch programming errors, which may be missed if a second return value is ignored. Simply check Deque.Len() before reading from the deque.
Command pigeon generates parsers in Go from a PEG grammar. This version is a fork: https://github.com/fy0/pigeon From Wikipedia [0]: Its features and syntax are inspired by the PEG.js project [1], while the implementation is loosely based on [2]. Formal presentation of the PEG theory by Bryan Ford is also an important reference [3]. An introductory blog post can be found at [4]. The pigeon tool must be called with PEG input as defined by the accepted PEG syntax below. The grammar may be provided by a file or read from stdin. The generated parser is written to stdout by default. The following options can be specified: If the code blocks in the grammar (see below, section "Code block") are golint- and go vet-compliant, then the resulting generated code will also be golint- and go vet-compliant. The generated code doesn't use any third-party dependency unless code blocks in the grammar require such a dependency. The accepted syntax for the grammar is formally defined in the grammar/pigeon.peg file, using the PEG syntax. What follows is an informal description of this syntax. Identifiers, whitespace, comments and literals follow the same notation as the Go language, as defined in the language specification (http://golang.org/ref/spec#Source_code_representation): The grammar must be Unicode text encoded in UTF-8. New lines are identified by the \n character (U+000A). Space (U+0020), horizontal tabs (U+0009) and carriage returns (U+000D) are considered whitespace and are ignored except to separate tokens. A PEG grammar consists of a set of rules. A rule is an identifier followed by a rule definition operator and an expression. An optional display name - a string literal used in error messages instead of the rule identifier - can be specified after the rule identifier. E.g.: The rule definition operator can be any one of those: A rule is defined by an expression. The following sections describe the various expression types. Expressions can be grouped by using parentheses, and a rule can be referenced by its identifier in place of an expression. The choice expression is a list of expressions that will be tested in the order they are defined. The first one that matches will be used. Expressions are separated by the forward slash character "/". E.g.: Because the first match is used, it is important to think about the order of expressions. For example, in this rule, "<=" would never be used because the "<" expression comes first: The sequence expression is a list of expressions that must all match in that same order for the sequence expression to be considered a match. Expressions are separated by whitespace. E.g.: A labeled expression consists of an identifier followed by a colon ":" and an expression. A labeled expression introduces a variable named with the label that can be referenced in the code blocks in the same scope. The variable will have the value of the expression that follows the colon. E.g.: The variable is typed as an empty interface, and the underlying type depends on the following: For terminals (character and string literals, character classes and the any matcher), the value is []byte. E.g.: For predicates (& and !), the value is always nil. E.g.: For a sequence, the value is a slice of empty interfaces, one for each expression value in the sequence. The underlying types of each value in the slice follow the same rules described here, recursively. E.g.: For a repetition (+ and *), the value is a slice of empty interfaces, one for each repetition. The underlying types of each value in the slice follow the same rules described here, recursively. E.g.: For a choice expression, the value is that of the matching choice. E.g.: For the optional expression (?), the value is nil or the value of the expression. E.g.: Of course, the type of the value can be anything once an action code block is used. E.g.: An expression prefixed with the ampersand "&" is the "and" predicate expression: it is considered a match if the following expression is a match, but it does not consume any input. An expression prefixed with the exclamation point "!" is the "not" predicate expression: it is considered a match if the following expression is not a match, but it does not consume any input. E.g.: The expression following the & and ! operators can be a code block. In that case, the code block must return a bool and an error. The operator's semantic is the same, & is a match if the code block returns true, ! is a match if the code block returns false. The code block has access to any labeled value defined in its scope. E.g.: An expression followed by "*", "?" or "+" is a match if the expression occurs zero or more times ("*"), zero or one time "?" or one or more times ("+") respectively. The match is greedy, it will match as many times as possible. E.g. A literal matcher tries to match the input against a single character or a string literal. The literal may be a single-quoted single character, a double-quoted string or a backtick-quoted raw string. The same rules as in Go apply regarding the allowed characters and escapes. The literal may be followed by a lowercase "i" (outside the ending quote) to indicate that the match is case-insensitive. E.g.: A character class matcher tries to match the input against a class of characters inside square brackets "[...]". Inside the brackets, characters represent themselves and the same escapes as in string literals are available, except that the single- and double-quote escape is not valid, instead the closing square bracket "]" must be escaped to be used. Character ranges can be specified using the "[a-z]" notation. Unicode classes can be specified using the "[\pL]" notation, where L is a single-letter Unicode class of characters, or using the "[\p{Class}]" notation where Class is a valid Unicode class (e.g. "Latin"). As for string literals, a lowercase "i" may follow the matcher (outside the ending square bracket) to indicate that the match is case-insensitive. A "^" as first character inside the square brackets indicates that the match is inverted (it is a match if the input does not match the character class matcher). E.g.: The any matcher is represented by the dot ".". It matches any character except the end of file, thus the "!." expression is used to indicate "match the end of file". E.g.: Code blocks can be added to generate custom Go code. There are three kinds of code blocks: the initializer, the action and the predicate. All code blocks appear inside curly braces "{...}". The initializer must appear first in the grammar, before any rule. It is copied as-is (minus the wrapping curly braces) at the top of the generated parser. It may contain function declarations, types, variables, etc. just like any Go file. Every symbol declared here will be available to all other code blocks. Although the initializer is optional in a valid grammar, it is usually required to generate a valid Go source code file (for the package clause). E.g.: Action code blocks are code blocks declared after an expression in a rule. Those code blocks are turned into a method on the "*current" type in the generated source code. The method receives any labeled expression's value as argument (as any) and must return two values, the first being the value of the expression (an any), and the second an error. If a non-nil error is returned, it is added to the list of errors that the parser will return. E.g.: Predicate code blocks are code blocks declared immediately after the and "&" or the not "!" operators. Like action code blocks, predicate code blocks are turned into a method on the "*current" type in the generated source code. The method receives any labeled expression's value as argument (as any) and must return two opt, the first being a bool and the second an error. If a non-nil error is returned, it is added to the list of errors that the parser will return. E.g.: State change code blocks are code blocks starting with "#". In contrast to action and predicate code blocks, state change code blocks are allowed to modify values in the global "state" store (see below). State change code blocks are turned into a method on the "*current" type in the generated source code. The method is passed any labeled expression's value as an argument (of type any) and must return a value of type error. If a non-nil error is returned, it is added to the list of errors that the parser will return, note that the parser does NOT backtrack if a non-nil error is returned. E.g: The "*current" type is a struct that provides four useful fields that can be accessed in action, state change, and predicate code blocks: "pos", "text", "state" and "globalStore". The "pos" field indicates the current position of the parser in the source input. It is itself a struct with three fields: "line", "col" and "offset". Line is a 1-based line number, col is a 1-based column number that counts runes from the start of the line, and offset is a 0-based byte offset. The "text" field is the slice of bytes of the current match. It is empty in a predicate code block. The "state" field is a global store, with backtrack support, of type "map[string]any". The values in the store are tied to the parser's backtracking, in particular if a rule fails to match then all updates to the state that occurred in the process of matching the rule are rolled back. For a key-value store that is not tied to the parser's backtracking, see the "globalStore". The values in the "state" store are available for read access in action and predicate code blocks, any changes made to the "state" store will be reverted once the action or predicate code block is finished running. To update values in the "state" use state change code blocks ("#{}"). IMPORTANT: The "globalStore" field is a global store of type "map[string]any", which allows to store arbitrary values, which are available in action and predicate code blocks for read as well as write access. It is important to notice, that the global store is completely independent from the backtrack mechanism of PEG and is therefore not set back to its old state during backtrack. The initialization of the global store may be achieved by using the GlobalStore function (http://godoc.org/github.com/mna/pigeon/test/predicates#GlobalStore). Be aware, that all keys starting with "_pigeon" are reserved for internal use of pigeon and should not be used nor modified. Those keys are treated as internal implementation details and therefore there are no guarantees given in regards of API stability. With options -support-left-recursion pigeon supports left recursion. E.g.: Supports indirect recursion: The implementation is based on the [Left-recursive PEG Grammars][9] article that links to [Left Recursion in Parsing Expression Grammars][10] and [Packrat Parsers Can Support Left Recursion][11] papers. References: pigeon supports an extension of the classical PEG syntax called failure labels, proposed by Maidl et al. in their paper "Error Reporting in Parsing Expression Grammars" [7]. The used syntax for the introduced expressions is borrowed from their lpeglabel [8] implementation. This extension allows to signal different kinds of errors and to specify, which recovery pattern should handle a given label. With labeled failures it is possible to distinguish between an ordinary failure and an error. Usually, an ordinary failure is produced when the matching of a character fails, and this failure is caught by ordered choice. An error (a non-ordinary failure), by its turn, is produced by the throw operator and may be caught by the recovery operator. In pigeon, the recovery expression consists of the regular expression, the recovery expression and a set of labels to be matched. First, the regular expression is tried. If this fails with one of the provided labels, the recovery expression is tried. If this fails as well, the error is propagated. E.g.: To signal a failure condition, the throw expression is used. E.g.: For concrete examples, how to use throw and recover, have a look at the examples "labeled_failures" and "thrownrecover" in the "test" folder. The implementation of the throw and recover operators work as follows: The failure recover expression adds the recover expression for every failure label to the recovery stack and runs the regular expression. The throw expression checks the recovery stack in reversed order for the provided failure label. If the label is found, the respective recovery expression is run. If this expression is successful, the parser continues the processing of the input. If the recovery expression is not successful, the parsing fails and the parser starts to backtrack. If throw and recover expressions are used together with global state, it is the responsibility of the author of the grammar to reset the global state to a valid state during the recovery operation. The parser generated by pigeon exports a few symbols so that it can be used as a package with public functions to parse input text. The exported API is: See the godoc page of the generated parser for the test/predicates grammar for an example documentation page of the exported API: http://godoc.org/github.com/mna/pigeon/test/predicates. Like the grammar used to generate the parser, the input text must be UTF-8-encoded Unicode. The start rule of the parser is the first rule in the PEG grammar used to generate the parser. A call to any of the Parse* functions returns the value generated by executing the grammar on the provided input text, and an optional error. Typically, the grammar should generate some kind of abstract syntax tree (AST), but for simple grammars it may evaluate the result immediately, such as in the examples/calculator example. There are no constraints imposed on the author of the grammar, it can return whatever is needed. When the parser returns a non-nil error, the error is always of type errList, which is defined as a slice of errors ([]error). Each error in the list is of type *parserError. This is a struct that has an "Inner" field that can be used to access the original error. So if a code block returns some well-known error like: The original error can be accessed this way: By default the parser will continue after an error is returned and will cumulate all errors found during parsing. If the grammar reaches a point where it shouldn't continue, a panic statement can be used to terminate parsing. The panic will be caught at the top-level of the Parse* call and will be converted into a *parserError like any error, and an errList will still be returned to the caller. The divide by zero error in the examples/calculator grammar leverages this feature (no special code is needed to handle division by zero, if it happens, the runtime panics and it is recovered and returned as a parsing error). Providing good error reporting in a parser is not a trivial task. Part of it is provided by the pigeon tool, by offering features such as filename, position, expected literals and rule name in the error message, but an important part of good error reporting needs to be done by the grammar author. For example, many programming languages use double-quotes for string literals. Usually, if the opening quote is found, the closing quote is expected, and if none is found, there won't be any other rule that will match, there's no need to backtrack and try other choices, an error should be added to the list and the match should be consumed. In order to do this, the grammar can look something like this: This is just one example, but it illustrates the idea that error reporting needs to be thought out when designing the grammar. Because the above mentioned error types (errList and parserError) are not exported, additional steps have to be taken, ff the generated parser is used as library package in other packages (e.g. if the same parser is used in multiple command line tools). One possible implementation for exported errors (based on interfaces) and customized error reporting (caret style formatting of the position, where the parsing failed) is available in the json example and its command line tool: http://godoc.org/github.com/mna/pigeon/examples/json Generated parsers have user-provided code mixed with pigeon code in the same package, so there is no package boundary in the resulting code to prevent access to unexported symbols. What is meant to be implementation details in pigeon is also available to user code - which doesn't mean it should be used. For this reason, it is important to precisely define what is intended to be the supported API of pigeon, the parts that will be stable in future versions. The "stability" of the version 1.0 API attempts to make a similar guarantee as the Go 1 compatibility [5]. The following lists what part of the current pigeon code falls under that guarantee (features may be added in the future): The pigeon command-line flags and arguments: those will not be removed and will maintain the same semantics. The explicitly exported API generated by pigeon. See [6] for the documentation of this API on a generated parser. The PEG syntax, as documented above. The code blocks (except the initializer) will always be generated as methods on the *current type, and this type is guaranteed to have the fields pos (type position) and text (type []byte). There are no guarantees on other fields and methods of this type. The position type will always have the fields line, col and offset, all defined as int. There are no guarantees on other fields and methods of this type. The type of the error value returned by the Parse* functions, when not nil, will always be errList defined as a []error. There are no guarantees on methods of this type, other than the fact it implements the error interface. Individual errors in the errList will always be of type *parserError, and this type is guaranteed to have an Inner field that contains the original error value. There are no guarantees on other fields and methods of this type. The above guarantee is given to the version 1.0 (https://github.com/mna/pigeon/releases/tag/v1.0.0) of pigeon, which has entered maintenance mode (bug fixes only). The current master branch includes the development toward a future version 2.0, which intends to further improve pigeon. While the given API stability should be maintained as far as it makes sense, breaking changes may be necessary to be able to improve pigeon. The new version 2.0 API has not yet stabilized and therefore changes to the API may occur at any time. References: