Package ql implements a pure Go embedded SQL database engine.
QL is a member of the SQL family of languages. It is less complex and less
powerful than SQL (whichever specification SQL is considered to be).
2017-01-10: Release v1.1.0 fixes some bugs and adds a configurable WAL
headroom.
2016-07-29: Release v1.0.6 enables alternatively using = instead of == for
equality operation.
2016-07-11: Release v1.0.5 undoes vendoring of lldb. QL now uses stable lldb
(github.com/cznic/lldb).
2016-07-06: Release v1.0.4 fixes a panic when closing the WAL file.
2016-04-03: Release v1.0.3 fixes a data race.
2016-03-23: Release v1.0.2 vendors github.com/cznic/exp/lldb and
github.com/camlistore/go4/lock.
2016-03-17: Release v1.0.1 adjusts for latest goyacc. Parser error messages
are improved and changed, but their exact form is not considered a API
change.
2016-03-05: The current version has been tagged v1.0.0.
2015-06-15: To improve compatibility with other SQL implementations, the
count built-in aggregate function now accepts * as its argument.
2015-05-29: The execution planner was rewritten from scratch. It should use
indices in all places where they were used before plus in some additional
situations. It is possible to investigate the plan using the newly added
EXPLAIN statement. The QL tool is handy for such analysis. If the planner
would have used an index, but no such exists, the plan includes hints in
form of copy/paste ready CREATE INDEX statements.
The planner is still quite simple and a lot of work on it is yet ahead. You
can help this process by filling an issue with a schema and query which
fails to use an index or indices when it should, in your opinion. Bonus
points for including output of `ql 'explain <query>'`.
2015-05-09: The grammar of the CREATE INDEX statement now accepts an
expression list instead of a single expression, which was further limited to
just a column name or the built-in id(). As a side effect, composite
indices are now functional. However, the values in the expression-list style
index are not yet used by other statements or the statement/query planner.
The composite index is useful while having UNIQUE clause to check for
semantically duplicate rows before they get added to the table or when such
a row is mutated using the UPDATE statement and the expression-list style
index tuple of the row is thus recomputed.
2015-05-02: The Schema field of table __Table now correctly reflects any
column constraints and/or defaults. Also, the (*DB).Info method now has that
information provided in new ColumInfo fields NotNull, Constraint and
Default.
2015-04-20: Added support for {LEFT,RIGHT,FULL} [OUTER] JOIN.
2015-04-18: Column definitions can now have constraints and defaults.
Details are discussed in the "Constraints and defaults" chapter below the
CREATE TABLE statement documentation.
2015-03-06: New built-in functions formatFloat and formatInt. Thanks
urandom! (https://github.com/urandom)
2015-02-16: IN predicate now accepts a SELECT statement. See the updated
"Predicates" section.
2015-01-17: Logical operators || and && have now alternative spellings: OR
and AND (case insensitive). AND was a keyword before, but OR is a new one.
This can possibly break existing queries. For the record, it's a good idea
to not use any name appearing in, for example, [7] in your queries as the
list of QL's keywords may expand for gaining better compatibility with
existing SQL "standards".
2015-01-12: ACID guarantees were tightened at the cost of performance in
some cases. The write collecting window mechanism, a formerly used
implementation detail, was removed. Inserting rows one by one in a
transaction is now slow. I mean very slow. Try to avoid inserting single
rows in a transaction. Instead, whenever possible, perform batch updates of
tens to, say thousands of rows in a single transaction. See also:
http://www.sqlite.org/faq.html#q19, the discussed synchronization principles
involved are the same as for QL, modulo minor details.
Note: A side effect is that closing a DB before exiting an application, both
for the Go API and through database/sql driver, is no more required,
strictly speaking. Beware that exiting an application while there is an open
(uncommitted) transaction in progress means losing the transaction data.
However, the DB will not become corrupted because of not closing it. Nor
that was the case before, but formerly failing to close a DB could have
resulted in losing the data of the last transaction.
2014-09-21: id() now optionally accepts a single argument - a table name.
2014-09-01: Added the DB.Flush() method and the LIKE pattern matching
predicate.
2014-08-08: The built in functions max and min now accept also time values.
Thanks opennota! (https://github.com/opennota)
2014-06-05: RecordSet interface extended by new methods FirstRow and Rows.
2014-06-02: Indices on id() are now used by SELECT statements.
2014-05-07: Introduction of Marshal, Schema, Unmarshal.
2014-04-15:
Added optional IF NOT EXISTS clause to CREATE INDEX and optional IF EXISTS
clause to DROP INDEX.
2014-04-12:
The column Unique in the virtual table __Index was renamed to IsUnique
because the old name is a keyword. Unfortunately, this is a breaking change,
sorry.
2014-04-11: Introduction of LIMIT, OFFSET.
2014-04-10: Introduction of query rewriting.
2014-04-07: Introduction of indices.
QL imports zappy[8], a block-based compressor, which speeds up its
performance by using a C version of the compression/decompression
algorithms. If a CGO-free (pure Go) version of QL, or an app using QL, is
required, please include 'purego' in the -tags option of go
{build,get,install}. For example:
If zappy was installed before installing QL, it might be necessary to
rebuild zappy first (or rebuild QL with all its dependencies using the -a
option):
The syntax is specified using Extended Backus-Naur Form (EBNF)
Lower-case production names are used to identify lexical tokens.
Non-terminals are in CamelCase. Lexical tokens are enclosed in double quotes
"" or back quotes “.
The form a … b represents the set of characters from a through b as
alternatives. The horizontal ellipsis … is also used elsewhere in the spec
to informally denote various enumerations or code snippets that are not
further specified.
QL source code is Unicode text encoded in UTF-8. The text is not
canonicalized, so a single accented code point is distinct from the same
character constructed from combining an accent and a letter; those are
treated as two code points. For simplicity, this document will use the
unqualified term character to refer to a Unicode code point in the source
text.
Each code point is distinct; for instance, upper and lower case letters are
different characters.
Implementation restriction: For compatibility with other tools, the parser
may disallow the NUL character (U+0000) in the statement.
Implementation restriction: A byte order mark is disallowed anywhere in QL
statements.
The following terms are used to denote specific character classes
The underscore character _ (U+005F) is considered a letter.
Lexical elements are comments, tokens, identifiers, keywords, operators and
delimiters, integer, floating-point, imaginary, rune and string literals and
QL parameters.
Line comments start with the character sequence // or -- and stop at the end
of the line. A line comment acts like a space.
General comments start with the character sequence /* and continue through
the character sequence */. A general comment acts like a space.
Comments do not nest.
Tokens form the vocabulary of QL. There are four classes: identifiers,
keywords, operators and delimiters, and literals. White space, formed from
spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and
newlines (U+000A), is ignored except as it separates tokens that would
otherwise combine into a single token.
The formal grammar uses semicolons ";" as separators of QL statements. A
single QL statement or the last QL statement in a list of statements can
have an optional semicolon terminator. (Actually a separator from the
following empty statement.)
Identifiers name entities such as tables or record set columns. An
identifier is a sequence of one or more letters and digits. The first
character in an identifier must be a letter.
For example
No identifiers are predeclared, however note that no keyword can be used as
an identifier. Identifiers starting with two underscores are used for meta
data virtual tables names. For forward compatibility, users should generally
avoid using any identifiers starting with two underscores. For example
The following keywords are reserved and may not be used as identifiers.
Keywords are not case sensitive.
The following character sequences represent operators, delimiters, and other
special tokens
Operators consisting of more than one character are referred to by names in
the rest of the documentation
An integer literal is a sequence of digits representing an integer constant.
An optional prefix sets a non-decimal base: 0 for octal, 0x or 0X for
hexadecimal. In hexadecimal literals, letters a-f and A-F represent values
10 through 15.
For example
A floating-point literal is a decimal representation of a floating-point
constant. It has an integer part, a decimal point, a fractional part, and an
exponent part. The integer and fractional part comprise decimal digits; the
exponent part is an e or E followed by an optionally signed decimal
exponent. One of the integer part or the fractional part may be elided; one
of the decimal point or the exponent may be elided.
For example
An imaginary literal is a decimal representation of the imaginary part of a
complex constant. It consists of a floating-point literal or decimal integer
followed by the lower-case letter i.
For example
A rune literal represents a rune constant, an integer value identifying a
Unicode code point. A rune literal is expressed as one or more characters
enclosed in single quotes. Within the quotes, any character may appear
except single quote and newline. A single quoted character represents the
Unicode value of the character itself, while multi-character sequences
beginning with a backslash encode values in various formats.
The simplest form represents the single character within the quotes; since
QL statements are Unicode characters encoded in UTF-8, multiple
UTF-8-encoded bytes may represent a single integer value. For instance, the
literal 'a' holds a single byte representing a literal a, Unicode U+0061,
value 0x61, while 'ä' holds two bytes (0xc3 0xa4) representing a literal
a-dieresis, U+00E4, value 0xe4.
Several backslash escapes allow arbitrary values to be encoded as ASCII
text. There are four ways to represent the integer value as a numeric
constant: \x followed by exactly two hexadecimal digits; \u followed by
exactly four hexadecimal digits; \U followed by exactly eight hexadecimal
digits, and a plain backslash \ followed by exactly three octal digits. In
each case the value of the literal is the value represented by the digits in
the corresponding base.
Although these representations all result in an integer, they have different
valid ranges. Octal escapes must represent a value between 0 and 255
inclusive. Hexadecimal escapes satisfy this condition by construction. The
escapes \u and \U represent Unicode code points so within them some values
are illegal, in particular those above 0x10FFFF and surrogate halves.
After a backslash, certain single-character escapes represent special
values
All other sequences starting with a backslash are illegal inside rune
literals.
For example
A string literal represents a string constant obtained from concatenating a
sequence of characters. There are two forms: raw string literals and
interpreted string literals.
Raw string literals are character sequences between back quotes “. Within
the quotes, any character is legal except back quote. The value of a raw
string literal is the string composed of the uninterpreted (implicitly
UTF-8-encoded) characters between the quotes; in particular, backslashes
have no special meaning and the string may contain newlines. Carriage
returns inside raw string literals are discarded from the raw string value.
Interpreted string literals are character sequences between double quotes
"". The text between the quotes, which may not contain newlines, forms the
value of the literal, with backslash escapes interpreted as they are in rune
literals (except that \' is illegal and \" is legal), with the same
restrictions. The three-digit octal (\nnn) and two-digit hexadecimal (\xnn)
escapes represent individual bytes of the resulting string; all other
escapes represent the (possibly multi-byte) UTF-8 encoding of individual
characters. Thus inside a string literal \377 and \xFF represent a single
byte of value 0xFF=255, while ÿ, \u00FF, \U000000FF and \xc3\xbf represent
the two bytes 0xc3 0xbf of the UTF-8 encoding of character U+00FF.
For example
These examples all represent the same string
If the statement source represents a character as two code points, such as a
combining form involving an accent and a letter, the result will be an error
if placed in a rune literal (it is not a single code point), and will appear
as two code points if placed in a string literal.
Literals are assigned their values from the respective text representation
at "compile" (parse) time. QL parameters provide the same functionality as
literals, but their value is assigned at execution time from an expression
list passed to DB.Run or DB.Execute. Using '?' or '$' is completely
equivalent.
For example
Keywords 'false' and 'true' (not case sensitive) represent the two possible
constant values of type bool (also not case sensitive).
Keyword 'NULL' (not case sensitive) represents an untyped constant which is
assignable to any type. NULL is distinct from any other value of any type.
A type determines the set of values and operations specific to values of
that type. A type is specified by a type name.
Named instances of the boolean, numeric, and string types are keywords. The
names are not case sensitive.
Note: The blob type is exchanged between the back end and the API as []byte.
On 32 bit platforms this limits the size which the implementation can handle
to 2G.
A boolean type represents the set of Boolean truth values denoted by the
predeclared constants true and false. The predeclared boolean type is bool.
A duration type represents the elapsed time between two instants as an int64
nanosecond count. The representation limits the largest representable
duration to approximately 290 years.
A numeric type represents sets of integer or floating-point values. The
predeclared architecture-independent numeric types are
The value of an n-bit integer is n bits wide and represented using two's
complement arithmetic.
Conversions are required when different numeric types are mixed in an
expression or assignment.
A string type represents the set of string values. A string value is a
(possibly empty) sequence of bytes. The case insensitive keyword for the
string type is 'string'.
The length of a string (its size in bytes) can be discovered using the
built-in function len.
A time type represents an instant in time with nanosecond precision. Each
time has associated with it a location, consulted when computing the
presentation form of the time.
The following functions are implicitly declared
An expression specifies the computation of a value by applying operators and
functions to operands.
Operands denote the elementary values in an expression. An operand may be a
literal, a (possibly qualified) identifier denoting a constant or a function
or a table/record set column, or a parenthesized expression.
A qualified identifier is an identifier qualified with a table/record set
name prefix.
For example
Primary expression are the operands for unary and binary expressions.
For example
A primary expression of the form
denotes the element of a string indexed by x. Its type is byte. The value x
is called the index. The following rules apply
- The index x must be of integer type except bigint or duration; it is in
range if 0 <= x < len(s), otherwise it is out of range.
- A constant index must be non-negative and representable by a value of type
int.
- A constant index must be in range if the string a is a literal.
- If x is out of range at run time, a run-time error occurs.
- s[x] is the byte at index x and the type of s[x] is byte.
If s is NULL or x is NULL then the result is NULL.
Otherwise s[x] is illegal.
For a string, the primary expression
constructs a substring. The indices low and high select which elements
appear in the result. The result has indices starting at 0 and length equal
to high - low.
For convenience, any of the indices may be omitted. A missing low index
defaults to zero; a missing high index defaults to the length of the sliced
operand
The indices low and high are in range if 0 <= low <= high <= len(a),
otherwise they are out of range. A constant index must be non-negative and
representable by a value of type int. If both indices are constant, they
must satisfy low <= high. If the indices are out of range at run time, a
run-time error occurs.
Integer values of type bigint or duration cannot be used as indices.
If s is NULL the result is NULL. If low or high is not omitted and is NULL
then the result is NULL.
Given an identifier f denoting a predeclared function,
calls f with arguments a1, a2, … an. Arguments are evaluated before the
function is called. The type of the expression is the result type of f.
In a function call, the function value and arguments are evaluated in the
usual order. After they are evaluated, the parameters of the call are passed
by value to the function and the called function begins execution. The
return value of the function is passed by value when the function returns.
Calling an undefined function causes a compile-time error.
Operators combine operands into expressions.
Comparisons are discussed elsewhere. For other binary operators, the operand
types must be identical unless the operation involves shifts or untyped
constants. For operations involving constants only, see the section on
constant expressions.
Except for shift operations, if one operand is an untyped constant and the
other operand is not, the constant is converted to the type of the other
operand.
The right operand in a shift expression must have unsigned integer type or
be an untyped constant that can be converted to unsigned integer type. If
the left operand of a non-constant shift expression is an untyped constant,
the type of the constant is what it would be if the shift expression were
replaced by its left operand alone.
Expressions of the form
yield a boolean value true if expr2, a regular expression, matches expr1
(see also [6]). Both expression must be of type string. If any one of the
expressions is NULL the result is NULL.
Predicates are special form expressions having a boolean result type.
Expressions of the form
are equivalent, including NULL handling, to
The types of involved expressions must be comparable as defined in
"Comparison operators".
Another form of the IN predicate creates the expression list from a result
of a SelectStmt.
The SelectStmt must select only one column. The produced expression list is
resource limited by the memory available to the process. NULL values
produced by the SelectStmt are ignored, but if all records of the SelectStmt
are NULL the predicate yields NULL. The select statement is evaluated only
once. If the type of expr is not the same as the type of the field returned
by the SelectStmt then the set operation yields false. The type of the
column returned by the SelectStmt must be one of the simple (non blob-like)
types:
Expressions of the form
are equivalent, including NULL handling, to
The types of involved expressions must be ordered as defined in "Comparison
operators".
Expressions of the form
yield a boolean value true if expr does not have a specific type (case A) or
if expr has a specific type (case B). In other cases the result is a boolean
value false.
Unary operators have the highest precedence.
There are five precedence levels for binary operators. Multiplication
operators bind strongest, followed by addition operators, comparison
operators, && (logical AND), and finally || (logical OR)
Binary operators of the same precedence associate from left to right. For
instance, x / y * z is the same as (x / y) * z.
Note that the operator precedence is reflected explicitly by the grammar.
Arithmetic operators apply to numeric values and yield a result of the same
type as the first operand. The four standard arithmetic operators (+, -, *,
/) apply to integer, rational, floating-point, and complex types; + also
applies to strings; +,- also applies to times. All other arithmetic
operators apply to integers only.
sum integers, rationals, floats, complex values, strings
difference integers, rationals, floats, complex values, times
product integers, rationals, floats, complex values
/ quotient integers, rationals, floats, complex values
% remainder integers
& bitwise AND integers
| bitwise OR integers
^ bitwise XOR integers
&^ bit clear (AND NOT) integers
<< left shift integer << unsigned integer
>> right shift integer >> unsigned integer
Strings can be concatenated using the + operator
String addition creates a new string by concatenating the operands.
A value of type duration can be added to or subtracted from a value of type time.
Times can subtracted from each other producing a value of type duration.
For two integer values x and y, the integer quotient q = x / y and remainder
r = x % y satisfy the following relationships
with x / y truncated towards zero ("truncated division").
As an exception to this rule, if the dividend x is the most negative value
for the int type of x, the quotient q = x / -1 is equal to x (and r = 0).
If the divisor is a constant expression, it must not be zero. If the divisor
is zero at run time, a run-time error occurs. If the dividend is
non-negative and the divisor is a constant power of 2, the division may be
replaced by a right shift, and computing the remainder may be replaced by a
bitwise AND operation
The shift operators shift the left operand by the shift count specified by
the right operand. They implement arithmetic shifts if the left operand is a
signed integer and logical shifts if it is an unsigned integer. There is no
upper limit on the shift count. Shifts behave as if the left operand is
shifted n times by 1 for a shift count of n. As a result, x << 1 is the same
as x*2 and x >> 1 is the same as x/2 but truncated towards negative
infinity.
For integer operands, the unary operators +, -, and ^ are defined as follows
For floating-point and complex numbers, +x is the same as x, while -x is the
negation of x. The result of a floating-point or complex division by zero is
not specified beyond the IEEE-754 standard; whether a run-time error occurs
is implementation-specific.
Whenever any operand of any arithmetic operation, unary or binary, is NULL,
as well as in the case of the string concatenating operation, the result is
NULL.
For unsigned integer values, the operations +, -, *, and << are computed
modulo 2n, where n is the bit width of the unsigned integer's type. Loosely
speaking, these unsigned integer operations discard high bits upon overflow,
and expressions may rely on “wrap around”.
For signed integers with a finite bit width, the operations +, -, *, and <<
may legally overflow and the resulting value exists and is deterministically
defined by the signed integer representation, the operation, and its
operands. No exception is raised as a result of overflow. An evaluator may
not optimize an expression under the assumption that overflow does not
occur. For instance, it may not assume that x < x + 1 is always true.
Integers of type bigint and rationals do not overflow but their handling is
limited by the memory resources available to the program.
Comparison operators compare two operands and yield a boolean value.
In any comparison, the first operand must be of same type as is the second
operand, or vice versa.
The equality operators == and != apply to operands that are comparable. The
ordering operators <, <=, >, and >= apply to operands that are ordered.
These terms and the result of the comparisons are defined as follows
- Boolean values are comparable. Two boolean values are equal if they are
either both true or both false.
- Complex values are comparable. Two complex values u and v are equal if
both real(u) == real(v) and imag(u) == imag(v).
- Integer values are comparable and ordered, in the usual way. Note that
durations are integers.
- Floating point values are comparable and ordered, as defined by the
IEEE-754 standard.
- Rational values are comparable and ordered, in the usual way.
- String values are comparable and ordered, lexically byte-wise.
- Time values are comparable and ordered.
Whenever any operand of any comparison operation is NULL, the result is
NULL.
Note that slices are always of type string.
Logical operators apply to boolean values and yield a boolean result. The
right operand is evaluated conditionally.
The truth tables for logical operations with NULL values
Conversions are expressions of the form T(x) where T is a type and x is an
expression that can be converted to type T.
A constant value x can be converted to type T in any of these cases:
- x is representable by a value of type T.
- x is a floating-point constant, T is a floating-point type, and x is
representable by a value of type T after rounding using IEEE 754
round-to-even rules. The constant T(x) is the rounded value.
- x is an integer constant and T is a string type. The same rule as for
non-constant x applies in this case.
Converting a constant yields a typed constant as result.
A non-constant value x can be converted to type T in any of these cases:
- x has type T.
- x's type and T are both integer or floating point types.
- x's type and T are both complex types.
- x is an integer, except bigint or duration, and T is a string type.
Specific rules apply to (non-constant) conversions between numeric types or
to and from a string type. These conversions may change the representation
of x and incur a run-time cost. All other conversions only change the type
but not the representation of x.
A conversion of NULL to any type yields NULL.
For the conversion of non-constant numeric values, the following rules
apply
1. When converting between integer types, if the value is a signed integer,
it is sign extended to implicit infinite precision; otherwise it is zero
extended. It is then truncated to fit in the result type's size. For
example, if v == uint16(0x10F0), then uint32(int8(v)) == 0xFFFFFFF0. The
conversion always yields a valid value; there is no indication of overflow.
2. When converting a floating-point number to an integer, the fraction is
discarded (truncation towards zero).
3. When converting an integer or floating-point number to a floating-point
type, or a complex number to another complex type, the result value is
rounded to the precision specified by the destination type. For instance,
the value of a variable x of type float32 may be stored using additional
precision beyond that of an IEEE-754 32-bit number, but float32(x)
represents the result of rounding x's value to 32-bit precision. Similarly,
x + 0.1 may use more than 32 bits of precision, but float32(x + 0.1) does
not.
In all non-constant conversions involving floating-point or complex values,
if the result type cannot represent the value the conversion succeeds but
the result value is implementation-dependent.
1. Converting a signed or unsigned integer value to a string type yields a
string containing the UTF-8 representation of the integer. Values outside
the range of valid Unicode code points are converted to "\uFFFD".
2. Converting a blob to a string type yields a string whose successive bytes
are the elements of the blob.
3. Converting a value of a string type to a blob yields a blob whose
successive elements are the bytes of the string.
4. Converting a value of a bigint type to a string yields a string
containing the decimal decimal representation of the integer.
5. Converting a value of a string type to a bigint yields a bigint value
containing the integer represented by the string value. A prefix of “0x” or
“0X” selects base 16; the “0” prefix selects base 8, and a “0b” or “0B”
prefix selects base 2. Otherwise the value is interpreted in base 10. An
error occurs if the string value is not in any valid format.
6. Converting a value of a rational type to a string yields a string
containing the decimal decimal representation of the rational in the form
"a/b" (even if b == 1).
7. Converting a value of a string type to a bigrat yields a bigrat value
containing the rational represented by the string value. The string can be
given as a fraction "a/b" or as a floating-point number optionally followed
by an exponent. An error occurs if the string value is not in any valid
format.
8. Converting a value of a duration type to a string returns a string
representing the duration in the form "72h3m0.5s". Leading zero units are
omitted. As a special case, durations less than one second format using a
smaller unit (milli-, micro-, or nanoseconds) to ensure that the leading
digit is non-zero. The zero duration formats as 0, with no unit.
9. Converting a string value to a duration yields a duration represented by
the string. A duration string is a possibly signed sequence of decimal
numbers, each with optional fraction and a unit suffix, such as "300ms",
"-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s",
"m", "h".
10. Converting a time value to a string returns the time formatted using the
format string
When evaluating the operands of an expression or of function calls,
operations are evaluated in lexical left-to-right order.
For example, in the evaluation of
the function calls and evaluation of c happen in the order h(), i(), j(), c.
Floating-point operations within a single expression are evaluated according
to the associativity of the operators. Explicit parentheses affect the
evaluation by overriding the default associativity. In the expression x + (y
+ z) the addition y + z is performed before adding x.
Statements control execution.
The empty statement does nothing.
Alter table statements modify existing tables. With the ADD clause it adds
a new column to the table. The column must not exist. With the DROP clause
it removes an existing column from a table. The column must exist and it
must be not the only (last) column of the table. IOW, there cannot be a
table with no columns.
For example
When adding a column to a table with existing data, the constraint clause of
the ColumnDef cannot be used. Adding a constrained column to an empty table
is fine.
Begin transactions statements introduce a new transaction level. Every
transaction level must be eventually balanced by exactly one of COMMIT or
ROLLBACK statements. Note that when a transaction is roll-backed because of
a statement failure then no explicit balancing of the respective BEGIN
TRANSACTION is statement is required nor permitted.
Failure to properly balance any opened transaction level may cause dead
locks and/or lose of data updated in the uppermost opened but never properly
closed transaction level.
For example
A database cannot be updated (mutated) outside of a transaction. Statements
requiring a transaction
A database is effectively read only outside of a transaction. Statements not
requiring a transaction
The commit statement closes the innermost transaction nesting level. If
that's the outermost level then the updates to the DB made by the
transaction are atomically made persistent.
For example
Create index statements create new indices. Index is a named projection of
ordered values of a table column to the respective records. As a special
case the id() of the record can be indexed. Index name must not be the same
as any of the existing tables and it also cannot be the same as of any
column name of the table the index is on.
For example
Now certain SELECT statements may use the indices to speed up joins and/or
to speed up record set filtering when the WHERE clause is used; or the
indices might be used to improve the performance when the ORDER BY clause is
present.
The UNIQUE modifier requires the indexed values tuple to be index-wise
unique or have all values NULL.
The optional IF NOT EXISTS clause makes the statement a no operation if the
index already exists.
A simple index consists of only one expression which must be either a column
name or the built-in id().
A more complex and more general index is one that consists of more than one
expression or its single expression does not qualify as a simple index. In
this case the type of all expressions in the list must be one of the non
blob-like types.
Note: Blob-like types are blob, bigint, bigrat, time and duration.
Create table statements create new tables. A column definition declares the
column name and type. Table names and column names are case sensitive.
Neither a table or an index of the same name may exist in the DB.
For example
The optional IF NOT EXISTS clause makes the statement a no operation if the
table already exists.
The optional constraint clause has two forms. The first one is found in many
SQL dialects.
This form prevents the data in column DepartmentName to be NULL.
The second form allows an arbitrary boolean expression to be used to
validate the column. If the value of the expression is true then the
validation succeeded. If the value of the expression is false or NULL then
the validation fails. If the value of the expression is not of type bool an
error occurs.
The optional DEFAULT clause is an expression which, if present, is
substituted instead of a NULL value when the colum is assigned a value.
Note that the constraint and/or default expressions may refer to other
columns by name:
When a table row is inserted by the INSERT INTO statement or when a table
row is updated by the UPDATE statement, the order of operations is as
follows:
1. The new values of the affected columns are set and the values of all the
row columns become the named values which can be referred to in default
expressions evaluated in step 2.
2. If any row column value is NULL and the DEFAULT clause is present in the
column's definition, the default expression is evaluated and its value is
set as the respective column value.
3. The values, potentially updated, of row columns become the named values
which can be referred to in constraint expressions evaluated during step 4.
4. All row columns which definition has the constraint clause present will
have that constraint checked. If any constraint violation is detected, the
overall operation fails and no changes to the table are made.
Delete from statements remove rows from a table, which must exist.
For example
If the WHERE clause is not present then all rows are removed and the
statement is equivalent to the TRUNCATE TABLE statement.
Drop index statements remove indices from the DB. The index must exist.
For example
The optional IF EXISTS clause makes the statement a no operation if the
index does not exist.
Drop table statements remove tables from the DB. The table must exist.
For example
The optional IF EXISTS clause makes the statement a no operation if the
table does not exist.
Insert into statements insert new rows into tables. New rows come from
literal data, if using the VALUES clause, or are a result of select
statement. In the later case the select statement is fully evaluated before
the insertion of any rows is performed, allowing to insert values calculated
from the same table rows are to be inserted into. If the ColumnNameList part
is omitted then the number of values inserted in the row must be the same as
are columns in the table. If the ColumnNameList part is present then the
number of values per row must be same as the same number of column names.
All other columns of the record are set to NULL. The type of the value
assigned to a column must be the same as is the column's type or the value
must be NULL.
For example
If any of the columns of the table were defined using the optional
constraints clause or the optional defaults clause then those are processed
on a per row basis. The details are discussed in the "Constraints and
defaults" chapter below the CREATE TABLE statement documentation.
Explain statement produces a recordset consisting of lines of text which
describe the execution plan of a statement, if any.
For example, the QL tool treats the explain statement specially and outputs
the joined lines:
The explanation may aid in uderstanding how a statement/query would be
executed and if indices are used as expected - or which indices may possibly
improve the statement performance. The create index statements above were
directly copy/pasted in the terminal from the suggestions provided by the
filter recordset pipeline part returned by the explain statement.
If the statement has nothing special in its plan, the result is the original
statement.
To get an explanation of the select statement of the IN predicate, use the EXPLAIN
statement with that particular select statement.
The rollback statement closes the innermost transaction nesting level
discarding any updates to the DB made by it. If that's the outermost level
then the effects on the DB are as if the transaction never happened.
For example
The (temporary) record set from the last statement is returned and can be
processed by the client.
In this case the rollback is the same as 'DROP TABLE tmp;' but it can be a
more complex operation.
Select from statements produce recordsets. The optional DISTINCT modifier
ensures all rows in the result recordset are unique. Either all of the
resulting fields are returned ('*') or only those named in FieldList.
RecordSetList is a list of table names or parenthesized select statements,
optionally (re)named using the AS clause.
The result can be filtered using a WhereClause and orderd by the OrderBy
clause.
For example
If Recordset is a nested, parenthesized SelectStmt then it must be given a
name using the AS clause if its field are to be accessible in expressions.
A field is an named expression. Identifiers, not used as a type in
conversion or a function name in the Call clause, denote names of (other)
fields, values of which should be used in the expression.
The expression can be named using the AS clause. If the AS clause is not
present and the expression consists solely of a field name, then that field
name is used as the name of the resulting field. Otherwise the field is
unnamed.
For example
The SELECT statement can optionally enumerate the desired/resulting fields
in a list.
No two identical field names can appear in the list.
When more than one record set is used in the FROM clause record set list,
the result record set field names are rewritten to be qualified using
the record set names.
If a particular record set doesn't have a name, its respective fields became
unnamed.
The optional JOIN clause, for example
is mostly equal to
except that the rows from a which, when they appear in the cross join, never
made expr to evaluate to true, are combined with a virtual row from b,
containing all nulls, and added to the result set. For the RIGHT JOIN
variant the discussed rules are used for rows from b not satisfying expr ==
true and the virtual, all-null row "comes" from a. The FULL JOIN adds the
respective rows which would be otherwise provided by the separate executions
of the LEFT JOIN and RIGHT JOIN variants. For more thorough OUTER JOIN
discussion please see the Wikipedia article at [10].
Resultins rows of a SELECT statement can be optionally ordered by the ORDER
BY clause. Collating proceeds by considering the expressions in the
expression list left to right until a collating order is determined. Any
possibly remaining expressions are not evaluated.
All of the expression values must yield an ordered type or NULL. Ordered
types are defined in "Comparison operators". Collating of elements having a
NULL value is different compared to what the comparison operators yield in
expression evaluation (NULL result instead of a boolean value).
Below, T denotes a non NULL value of any QL type.
NULL collates before any non NULL value (is considered smaller than T).
Two NULLs have no collating order (are considered equal).
The WHERE clause restricts records considered by some statements, like
SELECT FROM, DELETE FROM, or UPDATE.
It is an error if the expression evaluates to a non null value of non bool
type.
The GROUP BY clause is used to project rows having common values into a
smaller set of rows.
For example
Using the GROUP BY without any aggregate functions in the selected fields is
in certain cases equal to using the DISTINCT modifier. The last two examples
above produce the same resultsets.
The optional OFFSET clause allows to ignore first N records. For example
The above will produce only rows 11, 12, ... of the record set, if they
exist. The value of the expression must a non negative integer, but not
bigint or duration.
The optional LIMIT clause allows to ignore all but first N records. For
example
The above will return at most the first 10 records of the record set. The
value of the expression must a non negative integer, but not bigint or
duration.
The LIMIT and OFFSET clauses can be combined. For example
Considering table t has, say 10 records, the above will produce only records
4 - 8.
After returning record #8, no more result rows/records are computed.
1. The FROM clause is evaluated, producing a Cartesian product of its source
record sets (tables or nested SELECT statements).
2. If present, the JOIN cluase is evaluated on the result set of the
previous evaluation and the recordset specified by the JOIN clause. (...
JOIN Recordset ON ...)
3. If present, the WHERE clause is evaluated on the result set of the
previous evaluation.
4. If present, the GROUP BY clause is evaluated on the result set of the
previous evaluation(s).
5. The SELECT field expressions are evaluated on the result set of the
previous evaluation(s).
6. If present, the DISTINCT modifier is evaluated on the result set of the
previous evaluation(s).
7. If present, the ORDER BY clause is evaluated on the result set of the
previous evaluation(s).
8. If present, the OFFSET clause is evaluated on the result set of the
previous evaluation(s). The offset expression is evaluated once for the
first record produced by the previous evaluations.
9. If present, the LIMIT clause is evaluated on the result set of the
previous evaluation(s). The limit expression is evaluated once for the first
record produced by the previous evaluations.
Truncate table statements remove all records from a table. The table must
exist.
For example
Update statements change values of fields in rows of a table.
For example
Note: The SET clause is optional.
If any of the columns of the table were defined using the optional
constraints clause or the optional defaults clause then those are processed
on a per row basis. The details are discussed in the "Constraints and
defaults" chapter below the CREATE TABLE statement documentation.
To allow to query for DB meta data, there exist specially named tables, some
of them being virtual.
Note: Virtual system tables may have fake table-wise unique but meaningless
and unstable record IDs. Do not apply the built-in id() to any system table.
The table __Table lists all tables in the DB. The schema is
The Schema column returns the statement to (re)create table Name. This table
is virtual.
The table __Colum lists all columns of all tables in the DB. The schema is
The Ordinal column defines the 1-based index of the column in the record.
This table is virtual.
The table __Colum2 lists all columns of all tables in the DB which have the
constraint NOT NULL or which have a constraint expression defined or which
have a default expression defined. The schema is
It's possible to obtain a consolidated recordset for all properties of all
DB columns using
The Name column is the column name in TableName.
The table __Index lists all indices in the DB. The schema is
The IsUnique columns reflects if the index was created using the optional
UNIQUE clause. This table is virtual.
Built-in functions are predeclared.
The built-in aggregate function avg returns the average of values of an
expression. Avg ignores NULL values, but returns NULL if all values of a
column are NULL or if avg is applied to an empty record set.
The column values must be of a numeric type.
The built-in function contains returns true if substr is within s.
If any argument to contains is NULL the result is NULL.
The built-in aggregate function count returns how many times an expression
has a non NULL values or the number of rows in a record set. Note: count()
returns 0 for an empty record set.
For example
Date returns the time corresponding to
in the appropriate zone for that time in the given location.
The month, day, hour, min, sec, and nsec values may be outside their usual
ranges and will be normalized during the conversion. For example, October 32
converts to November 1.
A daylight savings time transition skips or repeats times. For example, in
the United States, March 13, 2011 2:15am never occurred, while November 6,
2011 1:15am occurred twice. In such cases, the choice of time zone, and
therefore the time, is not well-defined. Date returns a time that is correct
in one of the two zones involved in the transition, but it does not
guarantee which.
A location maps time instants to the zone in use at that time. Typically,
the location represents the collection of time offsets in use in a
geographical area, such as "CEST" and "CET" for central Europe. "local"
represents the system's local time zone. "UTC" represents Universal
Coordinated Time (UTC).
The month specifies a month of the year (January = 1, ...).
If any argument to date is NULL the result is NULL.
The built-in function day returns the day of the month specified by t.
If the argument to day is NULL the result is NULL.
The built-in function formatTime returns a textual representation of the
time value formatted according to layout, which defines the format by
showing how the reference time,
would be displayed if it were the value; it serves as an example of the
desired output. The same display rules will then be applied to the time
value.
If any argument to formatTime is NULL the result is NULL.
NOTE: The string value of the time zone, like "CET" or "ACDT", is dependent
on the time zone of the machine the function is run on. For example, if the
t value is in "CET", but the machine is in "ACDT", instead of "CET" the
result is "+0100". This is the same what Go (time.Time).String() returns and
in fact formatTime directly calls t.String().
returns
on a machine in the CET time zone, but may return
on a machine in the ACDT zone. The time value is in both cases the same so
its ordering and comparing is correct. Only the display value can differ.
The built-in functions formatFloat and formatInt format numbers
to strings using go's number format functions in the `strconv` package. For
all three functions, only the first argument is mandatory. The default values
of the rest are shown in the examples. If the first argument is NULL, the
result is NULL.
returns
returns
returns
Unlike the `strconv` equivalent, the formatInt function handles all integer
types, both signed and unsigned.
The built-in function hasPrefix tests whether the string s begins with prefix.
If any argument to hasPrefix is NULL the result is NULL.
The built-in function hasSuffix tests whether the string s ends with suffix.
If any argument to hasSuffix is NULL the result is NULL.
The built-in function hour returns the hour within the day specified by t,
in the range [0, 23].
If the argument to hour is NULL the result is NULL.
The built-in function hours returns the duration as a floating point number
of hours.
If the argument to hours is NULL the result is NULL.
The built-in function id takes zero or one arguments. If no argument is
provided, id() returns a table-unique automatically assigned numeric
identifier of type int. Ids of deleted records are not reused unless the DB
becomes completely empty (has no tables).
For example
If id() without arguments is called for a row which is not a table record
then the result value is NULL.
For example
If id() has one argument it must be a table name of a table in a cross join.
For example
The built-in function len takes a string argument and returns the lentgh of
the string in bytes.
The expression len(s) is constant if s is a string constant.
If the argument to len is NULL the result is NULL.
The built-in aggregate function max returns the largest value of an
expression in a record set. Max ignores NULL values, but returns NULL if
all values of a column are NULL or if max is applied to an empty record set.
The expression values must be of an ordered type.
For example
The built-in aggregate function min returns the smallest value of an
expression in a record set. Min ignores NULL values, but returns NULL if
all values of a column are NULL or if min is applied to an empty record set.
For example
The column values must be of an ordered type.
The built-in function minute returns the minute offset within the hour
specified by t, in the range [0, 59].
If the argument to minute is NULL the result is NULL.
The built-in function minutes returns the duration as a floating point
number of minutes.
If the argument to minutes is NULL the result is NULL.
The built-in function month returns the month of the year specified by t
(January = 1, ...).
If the argument to month is NULL the result is NULL.
The built-in function nanosecond returns the nanosecond offset within the
second specified by t, in the range [0, 999999999].
If the argument to nanosecond is NULL the result is NULL.
The built-in function nanoseconds returns the duration as an integer
nanosecond count.
If the argument to nanoseconds is NULL the result is NULL.
The built-in function now returns the current local time.
The built-in function parseTime parses a formatted string and returns the
time value it represents. The layout defines the format by showing how the
reference time,
would be interpreted if it were the value; it serves as an example of the
input format. The same interpretation will then be made to the input string.
Elements omitted from the value are assumed to be zero or, when zero is
impossible, one, so parsing "3:04pm" returns the time corresponding to Jan
1, year 0, 15:04:00 UTC (note that because the year is 0, this time is
before the zero Time). Years must be in the range 0000..9999. The day of the
week is checked for syntax but it is otherwise ignored.
In the absence of a time zone indicator, parseTime returns a time in UTC.
When parsing a time with a zone offset like -0700, if the offset corresponds
to a time zone used by the current location, then parseTime uses that
location and zone in the returned time. Otherwise it records the time as
being in a fabricated location with time fixed at the given zone offset.
When parsing a time with a zone abbreviation like MST, if the zone
abbreviation has a defined offset in the current location, then that offset
is used. The zone abbreviation "UTC" is recognized as UTC regardless of
location. If the zone abbreviation is unknown, Parse records the time as
being in a fabricated location with the given zone abbreviation and a zero
offset. This choice means that such a time can be parses and reformatted
with the same layout losslessly, but the exact instant used in the
representation will differ by the actual zone offset. To avoid such
problems, prefer time layouts that use a numeric zone offset.
If any argument to parseTime is NULL the result is NULL.
The built-in function second returns the second offset within the minute
specified by t, in the range [0, 59].
If the argument to second is NULL the result is NULL.
The built-in function seconds returns the duration as a floating point
number of seconds.
If the argument to seconds is NULL the result is NULL.
The built-in function since returns the time elapsed since t. It is
shorthand for now()-t.
If the argument to since is NULL the result is NULL.
The built-in aggregate function sum returns the sum of values of an
expression for all rows of a record set. Sum ignores NULL values, but
returns NULL if all values of a column are NULL or if sum is applied to an
empty record set.
The column values must be of a numeric type.
The built-in function timeIn returns t with the location information set to
loc. For discussion of the loc argument please see date().
If any argument to timeIn is NULL the result is NULL.
The built-in function weekday returns the day of the week specified by t.
Sunday == 0, Monday == 1, ...
If the argument to weekday is NULL the result is NULL.
The built-in function year returns the year in which t occurs.
If the argument to year is NULL the result is NULL.
The built-in function yearDay returns the day of the year specified by t, in
the range [1,365] for non-leap years, and [1,366] in leap years.
If the argument to yearDay is NULL the result is NULL.
Three functions assemble and disassemble complex numbers. The built-in
function complex constructs a complex value from a floating-point real and
imaginary part, while real and imag extract the real and imaginary parts of
a complex value.
The type of the arguments and return value correspond. For complex, the two
arguments must be of the same floating-point type and the return type is the
complex type with the corresponding floating-point constituents: complex64
for float32, complex128 for float64. The real and imag functions together
form the inverse, so for a complex value z, z == complex(real(z), imag(z)).
If the operands of these functions are all constants, the return value is a
constant.
If any argument to any of complex, real, imag functions is NULL the result
is NULL.
For the numeric types, the following sizes are guaranteed
Portions of this specification page are modifications based on work[2]
created and shared by Google[3] and used according to terms described in the
Creative Commons 3.0 Attribution License[4].
This specification is licensed under the Creative Commons Attribution 3.0
License, and code is licensed under a BSD license[5].
Links from the above documentation
This section is not part of the specification.
WARNING: The implementation of indices is new and it surely needs more time
to become mature.
Indices are used currently used only by the WHERE clause. The following
expression patterns of 'WHERE expression' are recognized and trigger index
use.
The relOp is one of the relation operators <, <=, ==, >=, >. For the
equality operator both operands must be of comparable types. For all other
operators both operands must be of ordered types. The constant expression is
a compile time constant expression. Some constant folding is still a TODO.
Parameter is a QL parameter ($1 etc.).
Consider tables t and u, both with an indexed field f. The WHERE expression
doesn't comply with the above simple detected cases.
However, such query is now automatically rewritten to
which will use both of the indices. The impact of using the indices can be
substantial (cf. BenchmarkCrossJoin*) if the resulting rows have low
"selectivity", ie. only few rows from both tables are selected by the
respective WHERE filtering.
Note: Existing QL DBs can be used and indices can be added to them. However,
once any indices are present in the DB, the old QL versions cannot work with
such DB anymore.
Running a benchmark with -v (-test.v) outputs information about the scale
used to report records/s and a brief description of the benchmark. For
example
Running the full suite of benchmarks takes a lot of time. Use the -timeout
flag to avoid them being killed after the default time limit (10 minutes).