The Influx Query Language Specification
Introduction
This is a reference for the Influx Query Language ("InfluxQL").
InfluxQL is a SQL-like query language for interacting with InfluxDB. It has
been lovingly crafted to feel familiar to those coming from other SQL or
SQL-like environments while providing features specific to storing and analyzing
time series data.
Notation
The syntax is specified using Extended Backus-Naur Form ("EBNF"). EBNF is the
same notation used in the Go programming language
specification, which can be found here. Not so
coincidentally, InfluxDB is written in Go.
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
Notation operators in order of increasing precedence:
| alternation
() grouping
[] option (0 or 1 times)
{} repetition (0 to n times)
Both single and multiline comments are supported. A comment is treated
the same as whitespace by the parser.
-- single line comment
/*
multiline comment
*/
Single line comments will skip all text until the scanner hits a
newline. Multiline comments will skip all text until the end comment
marker is hit. Nested multiline comments are not supported so the
following does not work:
/* /* this does not work */ */
Query representation
Characters
InfluxQL is Unicode text encoded in UTF-8.
newline = /* the Unicode code point U+000A */ .
unicode_char = /* an arbitrary Unicode code point except newline */ .
Letters and digits
Letters are the set of ASCII characters plus the underscore character _ (U+005F)
is considered a letter.
Only decimal digits are supported.
letter = ascii_letter | "_" .
ascii_letter = "A" … "Z" | "a" … "z" .
digit = "0" … "9" .
Identifiers
Identifiers are tokens which refer to database names, retention policy names,
user names, measurement names, tag keys, and field keys.
The rules:
- double quoted identifiers can contain any unicode character other than a new line
- double quoted identifiers can contain escaped
"
characters (i.e., \"
)
- double quoted identifiers can contain InfluxQL keywords
- unquoted identifiers must start with an upper or lowercase ASCII character or "_"
- unquoted identifiers may contain only ASCII letters, decimal digits, and "_"
identifier = unquoted_identifier | quoted_identifier .
unquoted_identifier = ( letter ) { letter | digit } .
quoted_identifier = `"` unicode_char { unicode_char } `"` .
Examples:
cpu
_cpu_stats
"1h"
"anything really"
"1_Crazy-1337.identifier>NAME👍"
Keywords
ALL ALTER ANALYZE ANY AS ASC
BEGIN BY CREATE CONTINUOUS DATABASE DATABASES
DEFAULT DELETE DESC DESTINATIONS DIAGNOSTICS DISTINCT
DROP DURATION END EVERY EXPLAIN FIELD
FOR FROM GRANT GRANTS GROUP GROUPS
IN INF INSERT INTO KEY KEYS
KILL LIMIT SHOW MEASUREMENT MEASUREMENTS NAME
OFFSET ON ORDER PASSWORD POLICY POLICIES
PRIVILEGES QUERIES QUERY READ REPLICATION RESAMPLE
RETENTION REVOKE SELECT SERIES SET SHARD
SHARDS SLIMIT SOFFSET STATS SUBSCRIPTION SUBSCRIPTIONS
TAG TO USER USERS VALUES WHERE
WITH WRITE
Literals
Integers
InfluxQL supports decimal integer literals. Hexadecimal and octal literals are
not currently supported.
int_lit = [ "+" | "-" ] ( "1" … "9" ) { digit } .
Floats
InfluxQL supports floating-point literals. Exponents are not currently supported.
float_lit = [ "+" | "-" ] ( "." digit { digit } | digit { digit } "." { digit } ) .
Strings
String literals must be surrounded by single quotes. Strings may contain '
characters as long as they are escaped (i.e., \'
).
string_lit = `'` { unicode_char } `'` .
Durations
Duration literals specify a length of time. An integer literal followed
immediately (with no spaces) by a duration unit listed below is interpreted as
a duration literal.
Duration units
u or µ | microseconds (1 millionth of a second) |
ms | milliseconds (1 thousandth of a second) |
s | second |
m | minute |
h | hour |
d | day |
w | week |
duration_lit = int_lit duration_unit .
duration_unit = "u" | "µ" | "ms" | "s" | "m" | "h" | "d" | "w" .
Dates & Times
The date and time literal format is not specified in EBNF like the rest of this document. It is specified using Go's date / time parsing format, which is a reference date written in the format required by InfluxQL. The reference date time is:
InfluxQL reference date time: January 2nd, 2006 at 3:04:05 PM
time_lit = "2006-01-02 15:04:05.999999" | "2006-01-02" .
Booleans
bool_lit = TRUE | FALSE .
Regular Expressions
regex_lit = "/" { unicode_char } "/" .
Comparators:
=~
matches against
!~
doesn't match against
Note: Use regular expressions to match measurements and tags.
You cannot use regular expressions to match databases, retention policies, or fields.
Queries
A query is composed of one or more statements separated by a semicolon.
query = statement { ";" statement } .
statement = alter_retention_policy_stmt |
create_continuous_query_stmt |
create_database_stmt |
create_retention_policy_stmt |
create_subscription_stmt |
create_user_stmt |
delete_stmt |
drop_continuous_query_stmt |
drop_database_stmt |
drop_measurement_stmt |
drop_retention_policy_stmt |
drop_series_stmt |
drop_shard_stmt |
drop_subscription_stmt |
drop_user_stmt |
explain_stmt |
grant_stmt |
kill_query_statement |
show_continuous_queries_stmt |
show_databases_stmt |
show_field_keys_stmt |
show_grants_stmt |
show_measurements_stmt |
show_queries_stmt |
show_retention_policies |
show_series_stmt |
show_shard_groups_stmt |
show_shards_stmt |
show_subscriptions_stmt|
show_tag_keys_stmt |
show_tag_values_stmt |
show_users_stmt |
revoke_stmt |
select_stmt .
Statements
ALTER RETENTION POLICY
alter_retention_policy_stmt = "ALTER RETENTION POLICY" policy_name on_clause
retention_policy_option
[ retention_policy_option ]
[ retention_policy_option ]
[ retention_policy_option ] .
Replication factors do not serve a purpose with single node instances.
Examples:
ALTER RETENTION POLICY "1h.cpu" ON "mydb" DEFAULT
ALTER RETENTION POLICY "policy1" ON "somedb" DURATION 1h REPLICATION 4
CREATE CONTINUOUS QUERY
create_continuous_query_stmt = "CREATE CONTINUOUS QUERY" query_name on_clause
[ "RESAMPLE" resample_opts ]
"BEGIN" select_stmt "END" .
query_name = identifier .
resample_opts = (every_stmt for_stmt | every_stmt | for_stmt) .
every_stmt = "EVERY" duration_lit
for_stmt = "FOR" duration_lit
Examples:
CREATE CONTINUOUS QUERY "10m_event_count"
ON "db_name"
BEGIN
SELECT count("value")
INTO "6_months"."events"
FROM "events"
GROUP BY time(10m)
END;
CREATE CONTINUOUS QUERY "1h_event_count"
ON "db_name"
BEGIN
SELECT sum("count") as "count"
INTO "2_years"."events"
FROM "6_months"."events"
GROUP BY time(1h)
END;
CREATE CONTINUOUS QUERY "cpu_mean"
ON "db_name"
RESAMPLE EVERY 10s FOR 2m
BEGIN
SELECT mean("value")
INTO "cpu_mean"
FROM "cpu"
GROUP BY time(1m)
END;
CREATE DATABASE
create_database_stmt = "CREATE DATABASE" db_name
[ WITH
[ retention_policy_duration ]
[ retention_policy_replication ]
[ retention_policy_shard_group_duration ]
[ retention_policy_name ]
] .
Replication factors do not serve a purpose with single node instances.
Examples:
CREATE DATABASE "foo"
CREATE DATABASE "bar" WITH DURATION 1d REPLICATION 1 SHARD DURATION 30m NAME "myrp"
CREATE DATABASE "mydb" WITH NAME "myrp"
CREATE RETENTION POLICY
create_retention_policy_stmt = "CREATE RETENTION POLICY" policy_name on_clause
retention_policy_duration
retention_policy_replication
[ retention_policy_shard_group_duration ]
[ "DEFAULT" ] .
Replication factors do not serve a purpose with single node instances.
Examples
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 DEFAULT
CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 SHARD DURATION 30m
CREATE SUBSCRIPTION
Subscriptions tell InfluxDB to send all the data it receives to Kapacitor or other third parties.
create_subscription_stmt = "CREATE SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy "DESTINATIONS" ("ANY"|"ALL") host { "," host} .
Examples:
CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ALL 'udp://example.com:9090'
CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ANY 'udp://h1.example.com:9090', 'udp://h2.example.com:9090'
CREATE USER
create_user_stmt = "CREATE USER" user_name "WITH PASSWORD" password
[ "WITH ALL PRIVILEGES" ] .
Examples:
CREATE USER "jdoe" WITH PASSWORD '1337password'
CREATE USER "jdoe" WITH PASSWORD '1337password' WITH ALL PRIVILEGES
Note: The password string must be wrapped in single quotes.
DELETE
delete_stmt = "DELETE" ( from_clause | where_clause | from_clause where_clause ) .
Examples:
DELETE FROM "cpu"
DELETE FROM "cpu" WHERE time < '2000-01-01T00:00:00Z'
DELETE WHERE time < '2000-01-01T00:00:00Z'
DROP CONTINUOUS QUERY
drop_continuous_query_stmt = "DROP CONTINUOUS QUERY" query_name on_clause .
Example:
DROP CONTINUOUS QUERY "myquery" ON "mydb"
DROP DATABASE
drop_database_stmt = "DROP DATABASE" db_name .
Example:
DROP DATABASE "mydb"
DROP MEASUREMENT
drop_measurement_stmt = "DROP MEASUREMENT" measurement .
Examples:
DROP MEASUREMENT "cpu"
DROP RETENTION POLICY
drop_retention_policy_stmt = "DROP RETENTION POLICY" policy_name on_clause .
Example:
DROP RETENTION POLICY "1h.cpu" ON "mydb"
DROP SERIES
drop_series_stmt = "DROP SERIES" ( from_clause | where_clause | from_clause where_clause ) .
Example:
DROP SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8'
DROP SHARD
drop_shard_stmt = "DROP SHARD" ( shard_id ) .
Example:
DROP SHARD 1
DROP SUBSCRIPTION
drop_subscription_stmt = "DROP SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy .
Example:
DROP SUBSCRIPTION "sub0" ON "mydb"."autogen"
DROP USER
drop_user_stmt = "DROP USER" user_name .
Example:
DROP USER "jdoe"
EXPLAIN
NOTE: This functionality is unimplemented.
explain_stmt = "EXPLAIN" [ "ANALYZE" ] select_stmt .
GRANT
NOTE: Users can be granted privileges on databases that do not exist.
grant_stmt = "GRANT" privilege [ on_clause ] to_clause .
Examples:
GRANT ALL TO "jdoe"
GRANT READ ON "mydb" TO "jdoe"
KILL QUERY
kill_query_statement = "KILL QUERY" query_id .
Examples:
--- kill a query with the query_id 36
KILL QUERY 36
NOTE: Identify the query_id
from the SHOW QUERIES
output.
SHOW CONTINUOUS QUERIES
show_continuous_queries_stmt = "SHOW CONTINUOUS QUERIES" .
Example:
SHOW CONTINUOUS QUERIES
SHOW DATABASES
show_databases_stmt = "SHOW DATABASES" .
Example:
SHOW DATABASES
SHOW FIELD KEYS
show_field_keys_stmt = "SHOW FIELD KEYS" [ from_clause ] .
Examples:
SHOW FIELD KEYS
SHOW FIELD KEYS FROM "cpu"
SHOW GRANTS
show_grants_stmt = "SHOW GRANTS FOR" user_name .
Example:
SHOW GRANTS FOR "jdoe"
SHOW MEASUREMENTS
show_measurements_stmt = "SHOW MEASUREMENTS" [on_clause] [ with_measurement_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] .
Examples:
SHOW MEASUREMENTS
SHOW MEASUREMENTS ON *.*
SHOW MEASUREMENTS ON mydb.myrp
SHOW MEASUREMENTS WHERE "region" = 'uswest' AND "host" = 'serverA'
SHOW MEASUREMENTS WITH MEASUREMENT =~ /h2o.*/
SHOW QUERIES
show_queries_stmt = "SHOW QUERIES" .
Example:
SHOW QUERIES
SHOW RETENTION POLICIES
show_retention_policies = "SHOW RETENTION POLICIES" on_clause .
Example:
SHOW RETENTION POLICIES ON "mydb"
SHOW SERIES
show_series_stmt = "SHOW SERIES" [ from_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] .
Example:
SHOW SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8'
SHOW SHARD GROUPS
show_shard_groups_stmt = "SHOW SHARD GROUPS" .
Example:
SHOW SHARD GROUPS
SHOW SHARDS
show_shards_stmt = "SHOW SHARDS" .
Example:
SHOW SHARDS
SHOW SUBSCRIPTIONS
show_subscriptions_stmt = "SHOW SUBSCRIPTIONS" .
Example:
SHOW SUBSCRIPTIONS
SHOW TAG KEYS
show_tag_keys_stmt = "SHOW TAG KEYS" [ from_clause ] [ where_clause ] [ group_by_clause ]
[ limit_clause ] [ offset_clause ] .
Examples:
SHOW TAG KEYS
SHOW TAG KEYS FROM "cpu"
SHOW TAG KEYS FROM "cpu" WHERE "region" = 'uswest'
SHOW TAG KEYS WHERE "host" = 'serverA'
SHOW TAG VALUES
show_tag_values_stmt = "SHOW TAG VALUES" [ from_clause ] with_tag_clause [ where_clause ]
[ group_by_clause ] [ limit_clause ] [ offset_clause ] .
Examples:
SHOW TAG VALUES WITH KEY = "region"
SHOW TAG VALUES FROM "cpu" WITH KEY = "region"
SHOW TAG VALUES WITH KEY !~ /.*c.*/
SHOW TAG VALUES FROM "cpu" WITH KEY IN ("region", "host") WHERE "service" = 'redis'
SHOW USERS
show_users_stmt = "SHOW USERS" .
Example:
SHOW USERS
REVOKE
revoke_stmt = "REVOKE" privilege [ on_clause ] "FROM" user_name .
Examples:
REVOKE ALL PRIVILEGES FROM "jdoe"
REVOKE READ ON "mydb" FROM "jdoe"
SELECT
select_stmt = "SELECT" fields from_clause [ into_clause ] [ where_clause ]
[ group_by_clause ] [ order_by_clause ] [ limit_clause ]
[ offset_clause ] [ slimit_clause ] [ soffset_clause ]
[ timezone_clause ] .
Examples:
SELECT mean("value") FROM "cpu" WHERE "region" = 'uswest' GROUP BY time(10m) fill(0)
SELECT mean("value") INTO "cpu_1h".:MEASUREMENT FROM /cpu.*/
SELECT mean("value") FROM "cpu" GROUP BY region, time(1d) fill(0) tz("America/Chicago")
Clauses
from_clause = "FROM" measurements .
group_by_clause = "GROUP BY" dimensions fill(fill_option).
into_clause = "INTO" ( measurement | back_ref ).
limit_clause = "LIMIT" int_lit .
offset_clause = "OFFSET" int_lit .
slimit_clause = "SLIMIT" int_lit .
soffset_clause = "SOFFSET" int_lit .
timezone_clause = tz(string_lit) .
on_clause = "ON" db_name .
order_by_clause = "ORDER BY" sort_fields .
to_clause = "TO" user_name .
where_clause = "WHERE" expr .
with_measurement_clause = "WITH MEASUREMENT" ( "=" measurement | "=~" regex_lit ) .
with_tag_clause = "WITH KEY" ( "=" tag_key | "!=" tag_key | "=~" regex_lit | "IN (" tag_keys ")" ) .
Expressions
binary_op = "+" | "-" | "*" | "/" | "%" | "&" | "|" | "^" | "AND" |
"OR" | "=" | "!=" | "<>" | "<" | "<=" | ">" | ">=" .
expr = unary_expr { binary_op unary_expr } .
unary_expr = "(" expr ")" | var_ref | time_lit | string_lit | int_lit |
float_lit | bool_lit | duration_lit | regex_lit .
Other
alias = "AS" identifier .
back_ref = ( policy_name ".:MEASUREMENT" ) |
( db_name "." [ policy_name ] ".:MEASUREMENT" ) .
db_name = identifier .
dimension = expr .
dimensions = dimension { "," dimension } .
field_key = identifier .
field = expr [ alias ] .
fields = field { "," field } .
fill_option = "null" | "none" | "previous" | "linear" | int_lit | float_lit .
host = string_lit .
measurement = measurement_name |
( policy_name "." measurement_name ) |
( db_name "." [ policy_name ] "." measurement_name ) .
measurements = measurement { "," measurement } .
measurement_name = identifier | regex_lit .
password = string_lit .
policy_name = identifier .
privilege = "ALL" [ "PRIVILEGES" ] | "READ" | "WRITE" .
query_id = int_lit .
query_name = identifier .
retention_policy = identifier .
retention_policy_option = retention_policy_duration |
retention_policy_replication |
retention_policy_shard_group_duration |
"DEFAULT" .
retention_policy_duration = "DURATION" duration_lit .
retention_policy_replication = "REPLICATION" int_lit .
retention_policy_shard_group_duration = "SHARD DURATION" duration_lit .
retention_policy_name = "NAME" identifier .
series_id = int_lit .
shard_id = int_lit .
sort_field = field_key [ ASC | DESC ] .
sort_fields = sort_field { "," sort_field } .
subscription_name = identifier .
tag_key = identifier .
tag_keys = tag_key { "," tag_key } .
user_name = identifier .
var_ref = measurement .
Query Engine Internals
Once you understand the language itself, it's important to know how these
language constructs are implemented in the query engine. This gives you an
intuitive sense for how results will be processed and how to create efficient
queries.
The life cycle of a query looks like this:
-
InfluxQL query string is tokenized and then parsed into an abstract syntax
tree (AST). This is the code representation of the query itself.
-
The AST is passed to the QueryExecutor
which directs queries to the
appropriate handlers. For example, queries related to meta data are executed
by the meta service and SELECT
statements are executed by the shards
themselves.
-
The query engine then determines the shards that match the SELECT
statement's time range. From these shards, iterators are created for each
field in the statement.
-
Iterators are passed to the emitter which drains them and joins the resulting
points. The emitter's job is to convert simple time/value points into the
more complex result objects that are returned to the client.
Understanding Iterators
Iterators are at the heart of the query engine. They provide a simple interface
for looping over a set of points. For example, this is an iterator over Float
points:
type FloatIterator interface {
Next() (*FloatPoint, error)
}
These iterators are created through the IteratorCreator
interface:
type IteratorCreator interface {
CreateIterator(m *Measurement, opt IteratorOptions) (Iterator, error)
}
The IteratorOptions
provide arguments about field selection, time ranges,
and dimensions that the iterator creator can use when planning an iterator.
The IteratorCreator
interface is used at many levels such as the Shards
,
Shard
, and Engine
. This allows optimizations to be performed when applicable
such as returning a precomputed COUNT()
.
Iterators aren't just for reading raw data from storage though. Iterators can be
composed so that they provided additional functionality around an input
iterator. For example, a DistinctIterator
can compute the distinct values for
each time window for an input iterator. Or a FillIterator
can generate
additional points that are missing from an input iterator.
This composition also lends itself well to aggregation. For example, a statement
such as this:
SELECT MEAN(value) FROM cpu GROUP BY time(10m)
In this case, MEAN(value)
is a MeanIterator
wrapping an iterator from the
underlying shards. However, if we can add an additional iterator to determine
the derivative of the mean:
SELECT DERIVATIVE(MEAN(value), 20m) FROM cpu GROUP BY time(10m)
Understanding Auxiliary Fields
Because InfluxQL allows users to use selector functions such as FIRST()
,
LAST()
, MIN()
, and MAX()
, the engine must provide a way to return related
data at the same time with the selected point.
For example, in this query:
SELECT FIRST(value), host FROM cpu GROUP BY time(1h)
We are selecting the first value
that occurs every hour but we also want to
retrieve the host
associated with that point. Since the Point
types only
specify a single typed Value
for efficiency, we push the host
into the
auxiliary fields of the point. These auxiliary fields are attached to the point
until it is passed to the emitter where the fields get split off to their own
iterator.
Built-in Iterators
There are many helper iterators that let us build queries:
-
Merge Iterator - This iterator combines one or more iterators into a single
new iterator of the same type. This iterator guarantees that all points
within a window will be output before starting the next window but does not
provide ordering guarantees within the window. This allows for fast access
for aggregate queries which do not need stronger sorting guarantees.
-
Sorted Merge Iterator - This iterator also combines one or more iterators
into a new iterator of the same type. However, this iterator guarantees
time ordering of every point. This makes it slower than the MergeIterator
but this ordering guarantee is required for non-aggregate queries which
return the raw data points.
-
Limit Iterator - This iterator limits the number of points per name/tag
group. This is the implementation of the LIMIT
& OFFSET
syntax.
-
Fill Iterator - This iterator injects extra points if they are missing from
the input iterator. It can provide null
points, points with the previous
value, or points with a specific value.
-
Buffered Iterator - This iterator provides the ability to "unread" a point
back onto a buffer so it can be read again next time. This is used extensively
to provide lookahead for windowing.
-
Reduce Iterator - This iterator calls a reduction function for each point in
a window. When the window is complete then all points for that window are
output. This is used for simple aggregate functions such as COUNT()
.
-
Reduce Slice Iterator - This iterator collects all points for a window first
and then passes them all to a reduction function at once. The results are
returned from the iterator. This is used for aggregate functions such as
DERIVATIVE()
.
-
Transform Iterator - This iterator calls a transform function for each point
from an input iterator. This is used for executing binary expressions.
-
Dedupe Iterator - This iterator only outputs unique points. It is resource
intensive so it is only used for small queries such as meta query statements.
Call Iterators
Function calls in InfluxQL are implemented at two levels. Some calls can be
wrapped at multiple layers to improve efficiency. For example, a COUNT()
can
be performed at the shard level and then multiple CountIterator
s can be
wrapped with another CountIterator
to compute the count of all shards. These
iterators can be created using NewCallIterator()
.
Some iterators are more complex or need to be implemented at a higher level.
For example, the DERIVATIVE()
needs to retrieve all points for a window first
before performing the calculation. This iterator is created by the engine itself
and is never requested to be created by the lower levels.
Subqueries
Subqueries are built on top of iterators. Most of the work involved in
supporting subqueries is in organizing how data is streamed to the
iterators that will process the data.
The final ordering of the stream has to output all points from one
series before moving to the next series and it also needs to ensure
those points are printed in order. So there are two separate concepts we
need to consider when creating an iterator: ordering and grouping.
When an inner query has a different grouping than the outermost query,
we still need to group together related points into buckets, but we do
not have to ensure that all points from one buckets are output before
the points in another bucket. In fact, if we do that, we will be unable
to perform the grouping for the outer query correctly. Instead, we group
all points by the outermost query for an interval and then, within that
interval, we group the points for the inner query. For example, here are
series keys and times in seconds (fields are omitted since they don't
matter in this example):
cpu,host=server01 0
cpu,host=server01 10
cpu,host=server01 20
cpu,host=server01 30
cpu,host=server02 0
cpu,host=server02 10
cpu,host=server02 20
cpu,host=server02 30
With the following query:
SELECT mean(max) FROM (SELECT max(value) FROM cpu GROUP BY host, time(20s)) GROUP BY time(20s)
The final grouping keeps all of the points together which means we need
to group server01
with server02
. That means we output the points
from the underlying engine like this:
cpu,host=server01 0
cpu,host=server01 10
cpu,host=server02 0
cpu,host=server02 10
cpu,host=server01 20
cpu,host=server01 30
cpu,host=server02 20
cpu,host=server02 30
Within each one of those time buckets, we calculate the max()
value
for each unique host so the output stream gets transformed to look like
this:
cpu,host=server01 0
cpu,host=server02 0
cpu,host=server01 20
cpu,host=server02 20
Then we can process the mean()
on this stream of data instead and it
will be output in the correct order. This is true of any order of
grouping since grouping can only go from more specific to less specific.
When it comes to ordering, unordered data is faster to process, but we
always need to produce ordered data. When processing a raw query with no
aggregates, we need to ensure data coming from the engine is ordered so
the output is ordered. When we have an aggregate, we know one point is
being emitted for each interval and will always produce ordered output.
So for aggregates, we can take unordered data as the input and get
ordered output. Any ordered data as input will always result in ordered
data so we just need to look at how an iterator processes unordered
data.
ordered input | ordered output | ordered output | ordered output | ordered output |
unordered input | unordered output | unordered output | ordered output | ordered output |
Since we always need ordered output, we just need to work backwards and
determine which pattern of input gives us ordered output. If both
ordered and unordered input produce ordered output, we prefer unordered
input since it is faster.
There are also certain aggregates that require ordered input like
median()
and percentile()
. These functions will explicitly request
ordered input. It is also important to realize that selectors that are
grouped by time are the equivalent of an aggregator. It is only
selectors without a group by time that are different.