Product
Socket Now Supports uv.lock Files
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
A standard/convention for running tasks over a list of files based around Node core streams2
A standard/convention for running tasks over a list of files based around Node core streams2.
Compatible with Node 0.8.x as well thanks to readable-stream by isaacs.
grunt is the Javascript task runner that's most popular, but I mostly prefer using makefiles since they require less ceremony.
However, sometimes you want to express something as an operation applied to a list of files, while keeping the ability to plug in more tasks via unix pipes and custom functions. That's what this is, a simple convention for working on a list of files using constructs from the Node core.
minitask is not a makefile replacement, it is a convention for writing things that apply a bunch of pipes to a list of files.
minitask doesn't even define any new APIs (unlike, say, node-task, which is destined to become grunt's next set of internals which seems to implement their own (!) synchronous (!!) version of core streams). In minitask, everything is just based on using Node core streams in a specific way and structuring your code into reusable tasks. The minitask repo is a bunch of functions that support those conventions.
Each minitask starts with a list of files, which simply an object that looks like this:
{
files: [
{ name: '/full/path/to/file.js' }
]
}
The minitask core API has a file iterator that can build these lists for consumption, given path specifications as inputs.
This array of files is then filtered an annotated using list tasks, which are functions. For example, filter-git.js
:
// filter-git-directories: a list task that filters out .git directories from the list
module.exports = function(list) {
list.files = list.files.filter(function(item) {
return !item.name.match(new RegExp('/\.git/'));
});
};
List tasks are basically any tasks that include / exclude or otherwise work on metadata.
To add metadata, you should add properties either to each file, or to the list object itself. For example, annotate-stat.js
:
var fs = require('fs');
// This task adds a .stat property to every file in the list
module.exports = function(list) {
list.files.forEach(function(item, i) {
list.files[i].stat = fs.statSync(item.name);
});
};
The key benefit of separating tasks such as filtering and annotating metadata into a step that occurs after the list of files is created is that it makes those tasks easier to reuse and test. Previously, I would perform filtering at the same time as I was reading in the file tree. The problem with doing both filtering and file tree iteration is that you end up with some unchangeable filtering logic that's embedded inside your file iterator.
Having your filtering and annotation embedded in the file iterator gets really annoying in some cases: for example, for gluejs there are multiple filtering rules: package.json files, .npmignore files and user-specified rules. Those were applied in various separate components that basically excluded some paths from traversal based on custom logic.
Rather than special casing and doing two things at the same time, with minitask you read in a file tree and then all filters work on the same structure: an array of paths with metadata. Since filtering is a operation that's separate from reading in the initial tree, it's much easier to see and configure what gets excluded and to define new metadata -related operations.
File tasks are the other type of task.
There are three different alternatives, corresponding to different native APIs:
They use the Node 0.10.x stream interface based on a convention that makes using child_process.spawn particularly easy:
// uglify-task: runs uglify
var spawn = require('child_process').spawn;
module.exports = function(options) {
var task = spawn('uglifyjs', ['--no-copyright']);
task.on('exit', function(code) {
task.emit('error', 'Child process exited with nonzero exit code: '+ code);
});
return task;
};
You have to return:
Note that child_process.spawn() returns exactly the right kind of object.
The key here is that every file task is a Node 0.10.x stream. Streams are easy to compose together via pipe(), and all I/O objects in Node are streams. This makes it easy to compose file tasks and to redirect them to different places.
If you're doing a JS-based stream transformation, then you can return a instance of Node core's stream.Transform duplex stream, wrapped to look like a process:
// use readable-stream to use Node 0.10.x streams in Node 0.8.x
var Transform = require('readable-stream').Transform;
function Wrap(options) {
Transform.call(this, options);
this.first = true;
}
// this is just the recommended boilerplate from the Node core docs
Wrap.prototype = Object.create(Transform.prototype, { constructor: { value: Wrap }});
Wrap.prototype._transform = function(chunk, encoding, done) {
if(this.first) {
this.push('!!');
this.first = false;
}
this.push(chunk);
done();
};
Wrap.prototype._flush = function(done) {
this.push('!!');
done();
};
module.exports = function(options) {
var instance = new Wrap(options);
// since it's a duplex stream, let the stdin and stdout point to the same thing
return {
stdin: instance,
stdout: instance
};
};
This also means that any 3rd party code that implements on stream.Transform
is immediately usable with just a wrapping function that creates a new instance.
The last piece of minitask is the runner.
The runner is the last task, it is responsible for using list tasks and file tasks to achieve whatever it wants. There are no strong requirements here; it's not worth it to really try to standardize the runner in my opinion - the overhead of dealing with some kind of standard for expressing a workflow is less than the benefits of reuse. Whatever can be reused should be extracted into file tasks and list tasks and the runner is everything that can't be reused.
The first parameter is the list structure of files, without any filters or tasks applied to it.
// serve-index:
var http = require('http');
module.exports = function(list, options) {
http.createServer(function(req, res) {
if(req.url == '/') {
res.end('<html><ul><li>'+ tree.files.join('</li><li>') +'</li></ul></html>');
} else {
res.end('Unknown: ' + req.url);
}
}).listen(8000).on('listening', function() {
console.log('Listening on localhost:8000');
});
};
The runner is king, it gets to decide what to do with the tree and options it's supplied.
The minitask core basically defines a set of helpers that support these convetions:
list.js
is the thing that iterates paths and returns a file list array for further consumptionrunner.js
is a function that applies a set of file tasks on a readable stream and returns a writable streamTODO: document the list
TODO: specify how the list should be annotated with tasks
The runner is a helper method that takes an input stream (e.g. an object { stdout: ... }), an array of tasks and a done function. It instantiates tasks if necessary, and pipes the tasks together, and ensures that the last task in the pipeline calls the done function.
Usage example:
var runner = require('minitask').runner,
tasks = [ fileTask, ... ];
var last = runner({ stdout: fs.createReadStream(filename) }, tasks, function() {
console.log('done');
});
// need to do this here so we can catch the second-to-last stream's "end" event;
last.stdout.pipe(process.stdout, { end: false });
File processing tasks such as package builds and metadata reads are often run multiple times. It is useful to cache the output from these tasks and only re-run the processing when a file has changed. GNU Make, for example, relies on dependency resolution + file last modified timestamps to skip work where possible.
A cacheable task is any task that reads a specific file path and writes to a writable stream at the end.
The caching system can either use a md5 hash, or the last modified+file size information to determine whether a task needs to be re-run. Additionally, an options hash can be passed to take into account different additional options.
When the caching system is used, the task output is additionally written to a separate file. The assumption here is that each file task (with a task options hash and input md5) performs the same deterministic transformation. When the current input file's md5 and task options hash match, then the previously written cached result is streamed directly rather than running the full stack of transformations.
The cache API looks a lot like the runner API, but it requires an explicit file path and options hash.
var last = cache({ filepath: filepath, cachepath: ..., md5: ..., stat: ..., options: ... }, tasks, function() {
});
FAQs
A standard/convention for running tasks over a list of files based around Node core streams2
We found that minitask demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.
Security News
PEP 770 proposes adding SBOM support to Python packages to improve transparency and catch hidden non-Python dependencies that security tools often miss.