Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

batchflow

Package Overview
Dependencies
Maintainers
1
Versions
10
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

batchflow

Batch process collections in parallel or sequentially.

  • 0.4.0
  • latest
  • npm
  • Socket score

Version published
Maintainers
1
Created
Source

Node.js - batchflow

build status

Batch process collections in parallel or sequentially.

Why?

I really got tired of writing the following patterns over and over again:

Sequential:

var files = [... list of files ...];
function again(x) {
	if (x < files.length) {
		fs.readFile(files[x], function(err, data) {
			//... do something with data ...
			again(x + 1);
		});
	} else {
		console.log('Done.');
	}
}

again(0);

or..

Parallel:

var files = [... list of files ...];
var pending = 0;
files.forEach(function(file, i) {
	pending += 1;
	fs.readFile(file, function(err, data) {
		//... do something with data ....
		
		pending -= 1;
		if (pending === 0 && i === files.length -1) {
			console.log('Done.');
		}
	});
});

That's ugly. For more complicated examples it requires a bit more thinking.

Why don't I use the wonderful library async? Well, async tries to do way too much. I also suffer from a severe case of NIH syndrome. Kidding, or else I'd rewrite Express.js. Or, am I? Muahahhaa. async syntax is also very ugly and not CoffeeScript friendly.

Installation

npm install batchflow

Examples

50 Foot Overview

Simple Sequential Example:

var a = [
        function(finished) { setTimeout(function(){finished(1)}, 1); }, //executes in 1 ms
        function(finished) { setTimeout(function(){finished(2)}, 20); }, //executes in 20 ms
        function(finished) { setTimeout(function(){finished(3)}, 2); } //executes in 2 ms
    ];

//sequential
batch(a).sequential()
.each(function(i, item, done) {
  item(done);
}).end(function(results) {
  for (var i = 0; i < results.length; ++i) {
    console.log(results[i]);
  }
});

/*
  1
  2
  3
*/

Simple Parallel Example:

//parallel
batch(a).parallel()
.each(function(i, item, done) {
  item(done);
}).end(function(results) {
  for (var i = 0; i < results.length; ++i) {
    console.log(results[i]);
  }
});

/*
  1
  3
  2
*/

Arrays

Let's rewrite the previous file patterns mentioned in Why? into a sequential example:

Sequential:

var batch = require('batchflow');

var files = [... list of files ...];
batch(files).sequential()
.each(function(i, item, next) {
	fs.readFile(item, function(err, data) {
		//do something with data
		next(someResult);
	});
}).end(function(results) {
	//analyze results
});

How about the parallel example?

Parallel:

var batch = require('batchflow');

var files = [... list of files ...];
batch(files).parallel()
.each(function(i, item, done) {
   fs.readFile(item, function(err, data) {
   	//do something with data
   	done(someResult); //<---- yes, you must still call done() in parallel, this way we can know when to trigger `end()`.
   });
}).end(function(results) {
   //analyze results
});

What's that, your data is not stored in an array? Oh, you say it's stored in an object? That's OK too...

Objects

Sequential:

var batch = require('batchflow');

var files = {'file1': 'path'.... 'filen': 'pathn'}
batch(files).sequential()
.each(function(key, val, next) {
	fs.readFile(val, function(err, data) {
		//do something with data
		next(someResult);
	});
}).end(function(results) {
	//analyze results
});

How about the parallel example?

Parallel:

var batch = require('batchflow');

var files = {'file1': 'path'.... 'filen': 'pathn'}
batch(files).parallel()
.each(function(key, val, done) {
   fs.readFile(val, function(err, data) {
   	//do something with data
   	done(someResult);
   });
}).end(function(results) {
   //analyze results
});

Misc

  1. Is sequential() or parallel() too long? Fine. series() and seq() are aliases for sequential(). par() is an alias for parallel().
  2. You don't like the fluent API? That's OK too:

Non-fluent API BatchFlow

var batch = require('batchflow');
var bf = batch(files);
bf.sequential()

bf.each(function(i, file, next) {
	next(someResult);
});
 
bf.end(function(results) {
	//blah blah
});

CoffeeScript Friendly

batch = require('batchflow')
files = [... list of files ...]
bf = batch(files).seq().each (i, file, done) ->
  fs.readFile file, done
bf.error (err) ->
  console.log(err);
bf.end (results) ->
  console.log fr.toString() for fr in results

Error Handling

What's that, you want error handling? Well, you might as well call me Burger King... have it your way.

Note that before version 0.3, it would exit prematurely if an error happened. This was a boneheaded design decision. After 0.3, it'll keep happily processing even if an error occured.

Catch an error in the callback parameter...

var a = {'f': '/tmp/file_DOES_NOT_exist_hopefully' + Math.random()};
batch(a).parallel().each(function(i, item, done) {
  fs.readFile(item, done);
}).error(function(err) {
  console.error(err);
}).end(function(fileData) {
  //do something with file data
});

Catch an error in the function...

var a = ['/tmp/file_DOES_NOT_exist_hopefully' + Math.random()];
batch(a).series().each(function(i, item, done) {
  throw new Error('err');
}).error(function(err) {
  console.error(err);
}).end(function() {
  //do something
});

Limits

You can set a limit to how many items can be processed in parallel. In fact, sequential mode is the same as having the limit set to 1 and calling parallel. In other words: batch(myArray).sequential() .... is the same as batch(myArray).parallel(1).

To set the limit, just pass the limit as a parameter to parallel(). The default is 2^53 which is the max integer size in JavaScript.

Example:

batch(myArray).parallel(8)
.each(function(i, item, done){
  // ... code here ... 
}).end(function(results){
  // ... code here ...
})

Difference between done() and next()

So you noticed that in all of the examples where I was calling sequential() the third parameter is named next and in the examples where I was calling parallel() the third parameter is named done? This is really just a matter of convention. It could be named fruitypebbles. But in sequential processing, it makes sense for it to be next because you want to process the next one. However, in parallel processing, you want to alert the system that the callback is done.

Sequential...

batch(myArray).sequential()
.each(function(i, item, next) {
  // ... code here ...
}).end();

Parallel...

batch(myArray).parallel()
.each(function(i, item, done) {
  // ... code here ...
}).end();

Progress

You can keep track of progress by accessing the finished field.

Compute percentage by this formula: (this.finished / this.total) * 100.0.

Example:

var myar = {w: 'hi', b: 'hello', c: 'hola', z: 'gutentag'}
batch(myar).sequential()
.each(function(i, item, next) {
  console.log(this.finished) //the number finished.
  console.log(this.total) //4
  console.log((this.finished / this.total) * 100.0) //{percent complete}
})
.end();

Author

node-batchflow was written by JP Richardson. You should follow him on Twitter @jprichardson. Also read his coding blog Procbits. If you write software with others, you should checkout Gitpilot to make collaboration with Git simple.

License

(MIT License)

Copyright 2012, JP jprichardson@gmail.com

Keywords

FAQs

Package last updated on 24 Apr 2013

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc