ez-streams is a simple but powerful streaming library for node.js.
The data that you push or pull may be anything: buffers and strings of course, but also simple values like numbers or Booleans, JavaScript objects, nulls, ... There is only one value which has a special meaning: undefined
. Reading undefined
means that you have reached the end of a reader stream. Writing undefined
signals that you want to end a writer stream.
## Creating a stream
The devices
modules let you get or create various kinds of EZ streams. For example:
var ez = require('ez-streams');
var log = ez.devices.console.log;
var stdin = ez.devices.std.in('utf8');
var textRd = ez.devices.file.text.reader(path);
var binWr = ez.devices.file.binary.writer(path);
var stringRd = ez.devices.string.reader(text);
You can also wrap any node.js stream into an EZ stream, with the node
device. For example:
var reader = ez.devices.node.reader(fs.createReadStream(path));
var writer = ez.devices.node.writer(fs.createWriteStream(path));
The ez.devices.http
and ez.devices.net
modules give you wrappers for servers and clients in which the request
and response objects are EZ readers and writers.
The ez.devices.generic
module lets you create your own EZ streams. For example here is how you would implement a reader that returns numbers from 0 to n
var numberReader = function(n) {
var i = 0;
return ez.devices.generic.reader(function read(_) {
if (i < n) return i++;
else return undefined;
});
};
To define your own reader you just need to pass an asynchronous read(_) {...}
function to ez.devices.generic.reader
.
To define your own writer you just need to pass an asynchronous write(_, val) {...}
function to ez.devices.generic.writer
.
So, for example, here is how you can wrap mongodb APIs into EZ streams:
var reader = function(cursor) {
return ez.devices.generic.reader(function(_) {
var obj = cursor.nextObject(_);
return obj == null ? undefined : obj;
});
}
var writer = function(collection) {
var done;
return ez.devices.generic.writer(function(_, val) {
if (val === undefined) done = true;
if (!done) collection.insert(val, _);
});
}
## Array-like API
You can treat an EZ reader very much like a JavaScript array: you can filter it, map it, reduce it, etc. For example you can write:
console.log("pi~=" + 4 * numberReader(10000).filter(function(_, n) {
return n % 2;
}).map(function(_, n) {
return n % 4 === 1 ? 1 / n : -1 / n;
}).reduce(_, function(_, res, val) {
return res + val;
}, 0));
This will compute 4 * (1 - 1/3 + 1/5 - 1/7 ...).
For those not used to streamline this chain can be rewritten with callbacks as:
numberReader(10000).filter(function(cb, n) {
cb(null, n % 2);
}).map(function(cb, n) {
cb(null, n % 4 === 1 ? 1 / n : -1 / n);
}).reduce(function(err, result) {
console.log("pi~=" + 4 * result);
}, function(cb, res, val) {
cb(null, res + val);
}, 0);
Every step of the chain, except the last one, returns a new reader. The first reader produces all integers up to 9999. The second one, which is returned by the filter
call lets only the odd integers go through. The third one, returned by the map
call transforms the odd integers into alternating fractions. The reduce
step at the end combines the alternating fractions to produce the final result.
Note that the reduce
function takes a continuation callback as first parameter while the other functions don't. This is because the other functions (filter
, map
) return another reader immediately, while reduce
pulls all the values from the stream and combines them to produce a result. So reduce
can only produce its result once all the operations have completed, and it does so by returning its result through a continuation callback.
The callbacks that you pass to filter
, map
, reduce
are slightly different from the callbacks that you pass to normal array functions. They receive a continuation callback (_
) as first parameter. This allows you to call asynchronous functions from these callbacks. We did not do it in the example above but this would be easy to do. For example we could slow down the computation by injecting a setTimeout
call in the filter operation:
console.log("pi~=" + 4 * numberReader(10000).filter(function(_, n) {
setTimeout(_, 10);
return n % 2;
})...
Rather academic here but in real life you often need to query databases or external services when filtering or mapping stream entries. So this is very useful.
The Array-like API also includes every
, some
and forEach
. On the other hand it does not include reduceRight
nor sort
, as these functions are incompatible with streaming (they would need to buffer the entire stream).
The forEach
, every
and some
functions are reducers and take a continuation callback, like reduce
(see example further down).
Note: the filter
, every
and some
methods can also be controlled by a mongodb filter condition rather than a function. The following are equivalent:
reader = numberReader(1000).filter(function(_, n) {
return n >= 10 && n < 20;
});
reader = numberReader(1000).filter({
$gte: 10,
$lt: 20,
});
## Pipe
Readers have a pipe
method that lets you pipe them into a writer:
reader.pipe(_, writer)
For example we can output the odd numbers up to 100 to the console by piping the number reader to the console device:
numberReader(100).filter(function(_, n) {
return n % 2;
}).pipe(_, ez.devices.console.log);
Note that pipe
is also a reducer. It takes a continuation callback. So you can schedule operations which will be executed after the pipe has been fully processed.
A major difference with standard node streams is that pipe
operations only appear once in a chain, at the end, instead of being inserted between processing steps. The EZ pipe
does not return a reader. Instead it returns (asynchronously) its writer argument, so that you can chain other operations on the writer itself. Here is a typical use:
var result = numberReader(100).map(function(_, n) {
return n + ' ';
}).pipe(_, ez.devices.string.writer()).toString();
In this example, the integers are mapped to strings which are written to an in-memory string writer. The string writer is returned by the pipe
call and we obtain its contents by applying toString()
.
## Infinite streams
You can easily create an infinite stream. For example, here is a reader stream that will return all numbers (*) in sequence:
var infiniteReader = function() {
var i = 0;
return ez.devices.generic.reader(function read(_) {
return i++;
});
};
(*): not quite as i++
will stop moving when i
reaches 2**53
EZ streams have methods like skip
, limit
, until
and while
that let you control how many entries you will read, even if the stream is potentially infinite. Here are two examples:
infiniteReader().skip(20).limit(100).pipe(_, ez.devices.console.log);
infiniteReader().until(function(_, n) {
return n * n > 1000;
}).pipe(_, ez.devices.console.log);
Note: while
and until
conditions can also be expressed as mongodb conditions.
## Transformations
The array functions are nice but they have limited power. They work well to process stream entries independently from each other but they don't allow us to do more complex operation like combining several entries into a bigger one, or splitting one entry into several smaller ones, or a mix of both. This is something we typically do when we parse text streams: we receive chunks of texts; we look for special boundaries and we emit the items that we have isolated between boundaries. Usually, there is not a one to one correspondance between the chunks that we receive and the items that we emit.
The transform
function is designed to handle these more complex operations. Typical code looks like:
stream.transform(function(_, reader, writer) {
}).filter(...).map(...).reduce(...);
You have complete freedom to organize your read and write calls: you can read several items, combine them and write only one result, you can read one item, split it and write several results, you can drop data that you don't want to transfer, or inject additional data with extra writes, etc.
Also, you are not limited to reading with the read(_)
call, you can use any API available on a reader, even another transform. For example, here is how you can implement a simple CSV parser:
var csvParser = function(_, reader, writer) {
var linesParser = ez.transforms.lines.parser();
reader = reader.transform(linesParser);
var keys = reader.read(_).split(',');
reader.forEach(_, function(_, line) {
if (line.length === 0) return;
var values = line.split(',');
var obj = {};
keys.forEach(function(key, i) {
obj[key] = values[i];
});
writer.write(_, obj);
});
};
You can then use this transform as:
ez.devices.file.text.reader('mydata.csv').transform(csvParser)
.pipe(_, ez.devices.console.log);
Note that the transform is written with a forEach
call which loops through all the items read from the input chain. This may seem incompatible with streaming but it is not. This loop advances by executing asynchronous reader.read(_)
and writer.write(_, obj)
calls. So it yields to the event loop and gives it chance to wake up other pending calls at other steps of the chain. So, even though the code may look like a tight loop, it is not. It gets processed one piece at a time, interleaved with other steps in the chain.
For example, you can read from a CSV file, filter its entries and write the output to a JSON file with:
The transforms library is rather embryonic at this stage but you can expect it to grow.
## Interoperability with native node.js streams
ez-streams
are fully interoperable with native node.js streams.
You can convert a node.js stream to an ez stream:
var reader = ez.devices.node.reader(stream);
var writer = ez.devices.node.writer(stream);
You can also convert in the reverse direction, from an ez stream to a node.js stream:
var stream = reader.nodify();
var stream = writer.nodify();
And you can transform an ez stream with a node duplex stream:
reader = reader.nodeTransform(duplexStream)
This part of the API is still fairly experimental and may change a bit.
## Exception handling
Exceptions are propagated through the chains and you can trap them in the reducer which pulls the items from the chain. If you write your code with streamline.js, you will naturally use try/catch:
try {
ez.devices.file.text.reader('users.csv').transform(ez.transforms.csv.parser())
.filter(function(_, item) {
return item.gender === 'F';
}).transform(ez.transforms.json.formatter({ space: '\t' }))
.pipe(_, ez.devices.file.text.writer('females.json'));
} catch (ex) {
logger.write(_, ex);
}
It you write your code with callbacks, you will receive the exception as first parameter in your continuation callback:
ez.devices.file.text.reader('users.csv').transform(ez.transforms.csv.parser())
.filter(function(cb, item) {
cb(null, item.gender === 'F');
}).transform(ez.transforms.json.formatter({ space: '\t' })).pipe(function(err) {
if (err) logger.write(function(e) {}, err);
}, ez.devices.file.text.writer('females.json'));
## Stopping a stream
Streams are not always consumed in full. If a consumer stops reading before it has reached the end of a stream, it must inform the stream that it won't read any further so that the stream can release its resources. This is achieved by propagating a stop
notification upwards, to the source of the stream. Streams that wrap node stream will release their event listeners when they receive this notification.
The stop API is a simple stop
method on readers:
reader.stop(arg);
Stopping becomes a bit tricky when a stream has been forked or teed. The stop API provides 3 options to stop a branch:
- Stopping only the current branch: the notification will be propagated to the fork but not further upwards, unless the other branches have also been stopped. This is the default when
arg
is falsy or omitted. - Stopping the current branch and closing the other branches silently. This is achieved by passing
true
as arg
. The consumers of the other branches will receive the undefined
end-of-stream marker when reading further. - Stopping the current branch and closing the other branches with an error. This is achieved by passing an error object as
arg
. The consumers of the other branches will get this error when reading further.
Note: In the second and third case values which had been buffered in the other branches before the stop call will still be delivered, before the end-of-stream marker or the error. So they may not stop immediately.
Operations like limit
, while
or until
send a stop
notification upwards.
A writer may also decide to stop its stream processing chain. If its write
method throws an exception the current branch will be stopped and the exception will be propagated to other branches. A writer may also stop the chain silently by throwing a new StopException(arg)
where arg
is the falsy or true
value which will be propagated towards the source of the chain.
Note: writers also have a stop
method but this method is only used internally to propagate exceptions in a tee
or fork
.
## Writer chaining
You can also chain operations on writers via a special pre
property. For example:
var rawWriter = ez.devices.file.binary.writer("data.gzip");
var zipWriter = rawWriter.pre.nodeTransform(zlib.createGzip());
All the chainable operations available on readers (map
, filter
, transform
, nodeTransform
, ...)
can also be applied to writers through this pre
property.
Note: the pre
property was introduced to stress the fact that the operation is applied before
writing to the original writer, even though it appears after in the chain.