npm install @apla/clickhouse
Synopsis
var ch = new ClickHouse ({host: clickhouse.host, port: 8123, auth: "user:password"});
var ch = new ClickHouse (clickhouse.host);
ch.query ("CREATE DATABASE clickhouse_test", function (err, data) {
});
ch.querying ("CREATE DATABASE clickhouse_test").then (…);
var stream = ch.query ("SELECT 1");
var rows = [];
stream.on ('metadata', function (columns) {
});
stream.on ('data', function (row) {
rows.push (row);
});
stream.on ('error', function (err) {
});
stream.on ('end', function () {
assert (rows.length === stream.supplemental.rows);
console.log ('rows in result set', stream.supplemental.rows_before_limit_at_least);
});
var tsvStream = fs.createReadStream ('data.tsv');
var clickhouseStream = clickHouse.query (statement, {inputFormat: 'TSV'});
tsvStream.pipe (clickhouseStream);
var clickhouseStream = clickHouse.query (statement, {inputFormat: 'TSV'}, function (err) {
console.log ('Insert complete!');
});
clickhouseStream.write ([1, 2.22, "erbgwerg", new Date ()]);
clickhouseStream.write ("1\t2.22\terbgwerg\t2017-07-17 17:17:17");
clickhouse.end ();
API
new ClickHouse (options)
var options = {
host: "clickhouse.msk",
queryOptions: {
profile: "web",
database: "test"
},
omitFormat: false
};
var clickHouse = new ClickHouse (options);
If you provide options as a string, they are assumed as a host parameter in connection options
Connection options (accept all options documented
for http.request):
- auth: authentication as
user:password
, optional - host: host to connect, can contain port name
- pathname: pathname of ClickHouse server or
/
if omited, - port: port number,
- protocol: "https:" or "http:", default "http:".
queryOptions
object can contain any option from Settings (docs:
en
ru
)
For example:
- database: default database name to lookup tables etc.
- profile: settings profile to use
- readonly: don't allow to change data
- max_rows_to_read: self explanatory
Driver options:
- dataObjects: use
FORMAT JSON
instead of FORMAT JSONCompact
for output.
By default (false), you'll receive array of values for each row. If you set dataObjects
to true, every row will become an object with format: {fieldName: fieldValue, …}
- format: this is format for data loading with
INSERT
statements. - syncParser: collect data, then parse entire response. Should be faster, but for
large datasets all your dataset goes into memory (actually, entire response + entire dataset).
Default:
false
- omitFormat:
FORMAT JSONCompact
will be added by default to every query
which returns dataset. Currently SELECT|SHOW|DESC|DESCRIBE|EXISTS\s+TABLE
.
You can change this behaviour by providing this option. In this case you should
add FORMAT JSONCompact
by yourself. Should be detected automatically. Default false
;
var stream = clickHouse.query (statement, [options], [callback])
Query sends a statement to a server
Stream is a regular nodejs object stream, it can be piped to process records.
Stream events:
- metadata: when a column information is parsed,
- data: when a row is available,
- error: something is wrong,
- end: when entire response is processed
After response is processed, you can read a supplemental response data, such as
row count via stream.supplemental
.
Options are the same for query
and constructor
excluding connection.
Callback is optional and will be called upon completion with
a standard node (error, result)
signature.
You should have at least one error handler listening. Via callbacks or via stream errors.
If you have callback and stream listener, you'll have error notification in both listeners.
Promise interface
Promise interface is not recommended for INSERT
and SELECT
queries.
INSERT
cannot bulk load data with promise interfaceSELECT
will collect entire query result in the memory
With promise interface query result are parsed snchronously.
This means that large query result in promise interface:
- Will snchronously block JS thread/event loop
- May lead to memory leaks in your app
Use it only for queries where resulting data size is is known and extreemly small.
The good cases to use it is DESCRIBE TABLE
or EXISTS TABLE
clickHouse.querying (statement, [options]).then (…)
Return promise
, that will be resolved with entire query result.
This is an alias to ch.query(query, {syncParser: true}, (error, data) => {})
Usage:
ch.querying ("SELECT 1").then((result) => console.log(result.data))
ch.querying ("DESCRIBE TABLE system.numbers", {dataObjects: true}).then((result) => console.log(result.data))
clickHouse.ping (function (err, response) {})
Sends an empty query and check if it "Ok.\n"
clickHouse.pinging ().then (…)
Promise interface for ping
Notes
Bulk data loading with INSERT statements
INSERT
can be used for bulk data loading. There is a 2 formats easily implementable
with javascript: CSV and TabSeparated/TSV.
CSV is useful for loading from file, thus you can read and pipe into clickhouse
file contents. To activate CSV parsing you should set inputFormat
option to CSV
for driver or query (BEWARE: not works as expected, use TSV):
var csvStream = fs.createReadStream ('data.csv');
var clickhouseStream = ch.query (statement, {inputFormat: CSV});
csvStream.pipe (clickhouseStream);
TSV is useful for loading from file and bulk loading from external sources, such as other databases.
Only \\
, \t
and \n
need to be escaped in strings; numbers, nulls,
bools and date objects need some minor processing. You can send prepared TSV data strings
(line ending will be appended automatically), buffers (always passed as is) or Arrays with fields.
Internally, every field will be converted to the format which ClickHouse can accept.
Then escaped and joined with delimiter for the particular format.
If you ever need to store rows (in arrays) and send preformatted data, you can do it.
ClickHouse also supports JSONEachRow format
which can be useful to insert javascript objects if you have such recordset.
const stream = ch.query (statement, {format: 'JSONEachRow'})
stream.write (object)
stream.end ()
Memory size
You can read all the records into memory in single call like this:
var ch = new ClickHouse ({host: host, port: port});
ch.query ("SELECT number FROM system.numbers LIMIT 10", {syncParser: true}, function (err, result) {
});
In this case whole JSON response from the server will be read into memory,
then parsed into memory hogging your CPU. Default parser will parse server response
line by line and emits events. This is slower, but much more memory and CPU efficient
for larger datasets.