Columnq
Simple CLI to help you query tabular data with support for a rich set of
growing formats and data sources.
It supports JSON, CSV, Parquet, Arrow and all other formats that are supported
by ROAPI, which is documented at
here.
It also supports querying datasets from remote locations like S3 and HTTPs, see
ROAPI's blob store
documentation for more
info.
Installation
Pre-built binary
The pre-built binaries hosted on GitHub releases.
These binaries are self-contained so you can just drop them into your PATH.
The same set of binaries are also distributed through PyPI:
pip install columnq-cli
Build from source
cargo install --locked --git https://github.com/roapi/roapi --branch main --bins columnq-cli
Usage
One off query
The sql
subcommand executes a provided SQL query against specificed static
dataset and returns the result in stdout on exit. This is usually useful for
script automation tasks.
$ columnq sql --table test_data/spacex_launches.json \
"SELECT COUNT(id), DATE_TRUNC('year', CAST(date_utc AS TIMESTAMP)) as d FROM spacex_launches WHERE success = true GROUP BY d ORDER BY d DESC"
+-----------+---------------------+
| COUNT(id) | d |
+-----------+---------------------+
| 4 | 2021-01-01 00:00:00 |
| 26 | 2020-01-01 00:00:00 |
| 13 | 2019-01-01 00:00:00 |
| 21 | 2018-01-01 00:00:00 |
| 18 | 2017-01-01 00:00:00 |
| 8 | 2016-01-01 00:00:00 |
| 6 | 2015-01-01 00:00:00 |
| 6 | 2014-01-01 00:00:00 |
| 3 | 2013-01-01 00:00:00 |
| 2 | 2012-01-01 00:00:00 |
| 2 | 2010-01-01 00:00:00 |
| 1 | 2009-01-01 00:00:00 |
| 1 | 2008-01-01 00:00:00 |
+-----------+---------------------+
By default, the sql
subcommand outputs results in human friendly table
format. You can change the output format using --output
option to make it
more friendly for automations.
$ columnq sql --table test_data/spacex_launches.json --output json "SELECT COUNT(id) AS total_launches FROM spacex_launches"
[{"total_launches":132}]
Automate with UNIX pipes
Just like other UNIX tools, columnq supports consuming data stream from stdin
to integrate with other CLI tools using UNIX pipe:
find . -printf "%M|%n|%u|%s|%P\n" | columnq sql \
--table 't=stdin,format=csv,has_header=false,delimiter=|' \
"SELECT SUM(column_4) as total_size FROM t"
+------------+
| total_size |
+------------+
| 9875017987 |
+------------+
Format conversion
The Columnq CLI can also be used as a handy utility to convert tabular data
between various formats: json
, parquet
, csv
, yaml
, arrow
, etc.
$ columnq sql --table 't=test_data/uk_cities_with_headers.csv' 'SELECT * FROM t' --output json
$ cat test_data/blogs.parquet | columnq sql --table 't=stdin,format=parquet' 'SELECT * FROM t' --output json
Interactive console
For dataset exploration, you can use the console
subcommand to query multiple
datasets in an interactive console environment:
$ columnq console \
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
--table "test_data/spacex_launches.json"
columnq(sql)> SELECT * FROM uk_cities WHERE lat > 57;
+-----------------------------+-----------+-----------+
| city | lat | lng |
+-----------------------------+-----------+-----------+
| Elgin, Scotland, the UK | 57.653484 | -3.335724 |
| Aberdeen, Aberdeen City, UK | 57.149651 | -2.099075 |
| Inverness, the UK | 57.477772 | -4.224721 |
+-----------------------------+-----------+-----------+
columnq(sql)> SELECT COUNT(*) FROM spacex_launches WHERE success=true AND upcoming=false;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 111 |
+-----------------+
Explore in memory catalog and table schemas:
columnq(sql)> SHOW TABLES;
+---------------+--------------------+-----------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+-----------------+------------+
| datafusion | public | uk_cities | BASE TABLE |
| datafusion | public | spacex_launches | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
| datafusion | information_schema | columns | VIEW |
+---------------+--------------------+-----------------+------------+
columnq(sql)> SHOW COLUMNS FROM uk_cities;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion | public | uk_cities | city | Utf8 | NO |
| datafusion | public | uk_cities | lat | Float64 | NO |
| datafusion | public | uk_cities | lng | Float64 | NO |
+---------------+--------------+------------+-------------+-----------+-------------+
Development
Debug mode
Set the RUST_LOG
environment variable to info,columnq=debug
to run columnq
in verbose debug logging.