shepherd: asynchronous dependency injection and more!

Shepherd is a graph-based dependency resolution system, designed to simplify request pipelines that have multiple asynchronous steps. Shepherd makes it easy to split code into fine-grained, composable units.
For example, a feed may draw on multiple sources of data which need to be fetched in parallel (and each may have multiple processing steps), but they may have some common dependencies. With Shepherd, you would break each step into a single function, which would return immediately or through a promise, and specify any direct dependencies. Once all of the components are created atomically, the Graph
will handle when each node runs and provide you with the processed output once all steps have completed.
Getting started
To get started with shepherd, you need to create a Graph
. A Graph
is a registry of all of the things you want to be able to do (units of work). First, instantiate the Graph:
var shepherd = require("shepherd")
var graph = new shepherd.Graph()
Next, you need to add some nodes to the Graph
which perform said units of work. Let's add 2 nodes to the Graph:
graph.add('timestamp-nowMillis', Date.now)
function toUpper(str) {
return str.toUpperCase()
}
graph.add('str-toUpper', toUpper, ['str'])
Now that you have a Graph
of things that can be done, you need to create a Builder
which will connect those different pieces together to produce a desired result. In this case we'll create a Builder
which will uppercase an input string and return the current timestamp in millis:
var builder = graph.newBuilder()
.builds('str-toUpper')
.builds('timestamp-nowMillis')
Finally, you can run the Builder
with a given set of inputs and it will optimally run through the units of work:
builder.run({str: "Hello"}, function (err, data) {
})
And that's the simple flow through shepherd, though there's a lot of additional functionality listed below.
What is the Graph really?
The Graph
in shepherd is the place where you put all the things your application is able to do. Before you add any nodes to a Graph
, you'll need to make sure to instantiate it first:
var shepherd = require("shepherd")
var graph = new shepherd.Graph()
Adding nodes to the Graph
In order to make the Graph
do some work for you, you'll have to add some nodes with Graph#add
. The first argument is the name of the node (which will be used to reference this node later), the second is the handler function to perform the work of this node, and the third is an optional array of arguments to be passed in to this node.
An alternate syntax is to pass the function in via .fn() and the arguments via .args():
graph.add("str-toUpper", function (str) { return str.toUpperCase() }, ['str'])
graph.add("str-toLower")
.args('str')
.fn(function (str) { return str.toLowerCase() })
About node names
Many functions in shepherd can take either the name of a node or an object (arg-to-node or node-to-arg mapping based on context) as an input. If the name of a node is passed in, the following rules are used to determine what the corresponding arg name should be (returning as soon as a rule is met):
- If the node name contains a . (using members, defined later), the arg name is anything to the right of the last . e.g.: user.username becomes username
- If the node name contains a -, the arg name is anything to the left of the first - e.g.: user-fromSpain becomes user
- If no previous match is made, the arg name is the node name e.g.: userCredentials becomes userCredentials
This leads to certain suggested naming patterns within the shepherd world:
- Nodes should be named as TYPE_OF_RESPONSE-SOURCE_OF_RESPONSE e.g.: user-byId
- Nodes should be referenced as specifically as possible via members (defined later) e.g.: user-byId.username
Using these syntaxes provides a lot of benefits as nodes that take in args with a given name can automatically infer their inputs:
graph.add('name-toUpper', function (name) { return name.toUpperCase() }, ['name'])
graph.add('name-fromLiteral', graph.literal('Jeremy'))
graph.add('userObj', {name: "Jeremy"})
graph.newBuilder()
.builds('name-toUpper')
.using('name-fromLiteral')
graph.newBuilder()
.builds('name-toUpper')
.using('userObj.name')
Arguments to nodes
By default, all args
and builds
are passed to the handler function in the
order they're written.
graph.add('str-formatTimestamp')
.args('timestamp')
.builds('formatter-forTimestamp')
.fn(function (timestamp, formatter) {
return formatter.format(timestamp)
})
To change the order, you can use inject(callback)
instead of
fn(callback)
. inject
will try to read the arguments of the function literal,
and match up the named arguments with the function arguments.
graph.add('str-formatTimestamp')
.args('timestamp')
.builds('formatter-forTimestamp')
.inject(function (formatter, timestamp) {
return formatter.format(timestamp)
})
inject
will throw errors if it can't find a match for the function literal arguments.
graph.add('str-formatTimestamp')
.args('timestamp')
.builds('formatter-forTimestamp')
.inject(function (formater, timestomp) {})
inject
may not work with functions built with higher-order primitives like
Function.prototype.bind, because it won't be able to figure out the argument
list.
Returning and errors
Graph
nodes may choose to return or throw synchronously or they may return or throw asynchronously through a promise (we currently use the node module kew
which is a lighter implementation of much Q
functionality) or via a node-style callback passed in as the last argument:
graph.add('result-sync', function () { return true })
graph.add('result-promise', function () { return require('kew').resolve(true) })
graph.add('result-callback', function (next) { next(undefined, true) })
graph.add('throws-sync', function () { throw new Error('NOOOO') })
graph.add('throws-promise', function () { return require('kew').reject(new Error('NOOOO')) })
graph.add('throws-callback', function () { next(new Error('NOOOO')) })
Literals
Literals can be added to the graph using a special wrapper object or through the utility method Graph#literal
:
graph.add('name-fromObject', {_literal: 'Jeremy'})
graph.add('name-fromFunction', graph.literal('Jeremy'))
If a literal is guaranteed not be a string, and not to be undefined, it may be added directly without using Graph#literal
.
graph.add('secret-ofLifeTheUniverseAndEverything', 42)
graph.add('object-test', {isTest: true})
Strings require a special case due to the cloning behavior built into shepherd. If you wish to clone a node into a node with a new name, call .add() with the old name and the new name:
graph.add('name-oldNode', 'name-newNode')
Modifiers
A node may also define one or more modifiers for itself when it is added to the Graph
. Modifiers exist as other nodes in the Graph
and serve to transform the output of a node (asynchronously). If a modifier is added with only the name of the node, the name of the argument to use when calling the modifier is deduced from the parent's name:
graph.add('name-toUpper', function (name) { return name.toUpperCase() }, ['name'])
graph.add('name-fromObject', {_literal: 'Jeremy'})
graph.add('name-fromObjectUpper')
.builds('name-fromObject')
.modifiers('name-toUpper', 'name-someOtherModifier')
You can explicitly pass in the name of the argument for the modifier by creating an object with a key of the modifer node name and a value of the argument name (think "into modifier as argument"):
graph.add('str-toUpper', function (str) { return str.toUpperCase() }, ['str'])
graph.add('name-fromObject', {_literal: 'Jeremy'})
graph.add('name-fromObjectUpper')
.builds('name-fromObject')
.modifiers({'str-toUpper': 'str'})
You may also create modifiers from functions which take the node to be modified as the first argument:
function toUpper(str) { return str.toUpperCase() }
graph.add('name-fromObject', {_literal: 'Jeremy'})
graph.add('name-fromObjectUpper')
.builds('name-fromObject')
.modifiers(toUpper)
A node passed to .modifiers()
may only have one input. If the node requires additional
inputs, you must set those inputs first with a .configure()
call.
builder.add('str-method', function (str, method) {
return str[method]()
}, ['str', 'method'])
builder
.configure('str-method').using({method: graph.literal('toUpper')})
.builds('str-fromInput')
.modifiers('str-method')
Caching and de-duplication
By default, a Builder
instance will merge all nodes that it finds have the exact same handler function when it runs through its compile() phase and these functions will only ever run once during a Builder#run
call. If you wish to make sure that a node is ran every time it is referenced by another node, you can call .disableNodeCache()
on the node when adding the node to the Graph
:
graph.add("timestamp-nowMillis", Date.now)
.disableNodeCache()
A word about chaining
The methods defined above that apply to nodes when they're added to the graph (.args()
and .modifiers()
) must be ran before any .builds()
calls are ran for the node in order to disambiguate what node the calls should affect.
Building nodes
Now that we have nodes added to a Graph
instance, we need to actually set up relationships between and make a Builder
which will create our output. All of the functions specified here can be applied to either a node that has been added to the Graph
or to a node being built in a Builder
. Basic examples will be provided for both options and the variations can be extrapolated from those.
.builds()
.builds()
should be called when a node needs to be ran in the current context:
builder
.builds('name-fromLiteral')
graph.add('name-validated', validateName)
.builds('name-fromLiteral')
Member variables
If you only wish to retrieve a member variable of a node, you can access it using standard javascript . delimiters:
builder
.builds('user.name')
.builds('user.emails.primary')
Remapping
Nodes can be renamed / aliased at build time by passing an object to .builds()
with the new name of the node as the key and the current node name as the value. This allows you to call a node multiple times while providing each instance with different inputs
builder
.builds({'user1': 'user-byUserId'})
.using({userId: 1})
.builds({'user2': 'user-byUserId'})
.using({userId: 2})
Void nodes
Nodes can also be built and run without their output being provided to the
requester by prefixing the node name with ? e.g.: ?helper-fromId (if the
node is remapped, prefix the alias with ? e.g.: {'?helper':
'helper-fromId'}). This is useful if you need to build a value for another
node, but don't want to pass it to the callback function.
graph.add('user-byEmail')
.args('?email')
.builds('?userId-byEmail').using('email')
.builds('user-byId').using('userId-byEmail')
.fn(function (user) {
return user
})
In the above example, ?email and ?userID-byEmail are not passed to the
callback function, because they are prefixed with ?.
Important nodes
Nodes can also be built and run before all other nodes by prefixing the node
name with !, e.g.: !validateEmail (if the node is remapped, prefix the
alias with ! e.g.: {'!validator': 'validateEmail'}) This is similar to
!important in CSS.
This is particularly useful in the case of validators and permission-checks,
which may throw an Error
if a condition isn't met. The following
example uses an important node to actually stop the work from being done:
graph.add('updateEmail')
.args('user', 'email')
.builds('!validateEmail')
.using('args.email')
.builds('bool-writeEmailToDatabase')
.using('args.email')
In the above example, validateEmail will get run before
bool-writeEmailToDatabase, and all the nodes that
bool-writeEmailToDatabase depends on.
Like !important, this is a one-shot deal: you can't have importanter nodes
or !!important. It does not make sense for important nodes to depend on
other important nodes.
Important nodes are also void: they do not pass their outputs to their handler.
.using()
Nodes defined via .builds()
will often need to be wired up to know what context they should be called in. .using()
provides this ability by specifying where a node should get its inputs from. Inputs may be literals (using the rules provided above), anonymous functions, other nodes, or arguments provided to the parent node (in the case of a Graph
node).
graph.add('str-fromInput', {_literal: "This is my string"})
graph.add('str-toUpper', toUpper, ['str'])
builder
.builds('str-toUpper')
.using('str-fromInput')
builder
.builds('str-toUpper')
.using({'str': { _literal: 'This is my string' }})
builder
.builds('str-toUpper')
.using({'str': function () { return 'This is my string' }})
graph.add('str-transformed', transformString)
.args('?str')
.builds('str-toUpper')
.using('args.str')
Arrays passed as an argument through .using()
will be interpreted as converting an array of node names into an array of values for those nodes:
function nameFromParts(parts) {
return parts.join(' ')
}
graph.add('name-fromParts', nameFromParts, ['parts'])
builder
.builds('name-fromParts')
.using({parts: ['firstName', 'middleInitial', 'lastName']})
.modifiers()
Nodes defined via .builds()
may have modifiers added on in a manner identical to a node adding a modifier to itself (as defined above):
function toUpper(str) { return str.toUpperCase() }
builder
.builds('str-fromInput')
.modifiers('str-toUpper', {'str-toLower': 'str'}, toUpper)
Lazy nodes
Shepherd has a "lazy" primitive that allows you to delay execution of a subgraph
until run-time.
Nodes defined with graph.addLazy
will return immediately with a function. The
function, when called, will begin evaluating the subgraph and return a promise
with its result.
graph.addLazy('lazyNum-one')
.fn(function () {
console.log('evaluating oneLazyNode')
return 1
})
graph.add('num-one')
.builds('lazyNum-one')
.inject(function (lazyNum) {
return lazyNum().fail(function (e) {
throw new Error('failed evaluating oneLazyNode: ' + e)
})
})
Running the Builder
Once your Graph
has been constructed and your Builder
has been built, you can actually do something with it! Builder#run()
accepts an input object, which will add new nodes to the graph (data-only) at run-time, and an optional callback:
builder.run({name: "Jeremy", currentTimestamp: Date.now()}, function (err, data) {
})
.then(function (data) {
})
.fail(function (err) {
})
Utility Methods
Graph#forceClone()
If you wish to create a base Graph
instance which may be extended without mutating the original, you may explicitly call Graph#clone()
or use Graph#forceClone()
to cause any mutating changes to create a new Graph
instance with a copy of all existing NodeDefinition
instances:
var newGraph = graph.clone()
graph.forceClone()
var newGraph = graph.add('someNode', someFunction)
Graph#validator()
Graph#validator()
may be called to create a function which takes a string inputs and validates it against a regular expression. If the regular expression fails, an Error is returned with a specified error message:
graph.add('email-validated', graph.validator(/^[^@]+@.*/, "An invalid-email address was provided"), ['email'])
Graph#setter()
Graph#setter()
may be called to create a function which takes an object and a value and sets a field on the object to that value:
graph.add('user-setEmail', graph.setter('email'), ['user', 'email'])
Graph#deleter()
Graph#deleter()
may be called to create a function which takes an object and deletes a specified field on the object:
graph.add('user-deleteEmail', graph.deleter('email'), ['user'])
Graph#subgraph()
When adding a node, if no value or function is specified, the function assigned to a node will default to Graph#subgraph
.
graph.add('user-updateEmail', graph.subgraph)
graph.add('user-updateEmail')
Graph#subgraph()
returns a function which will always return the last non-callback parameter passed into it. This is useful for creating a "subgraph" within a Graph
which doesn't contain any new functions but may coordinate complex operations:
graph.add('user-updateEmail')
.args('?user', '?email')
.builds('?user-setEmail')
.using('args.email')
.modifiers('user-validateEmail')
.builds('user-save')
.using('user-setEmail')
In addition, Graph#subgraph()
has an optional .returns()
method which can be used to specify which of the previous arguments should be returned (including children of those arguments). .returns()
must always be the last function in the Graph#subgraph()
chain:
graph.add('userId-updateEmail')
.args('user', 'email')
.builds('user-setEmail')
.using('args.email')
.modifiers('user-validateEmail')
.builds('user-save')
.using('user-setEmail')
.returns('user-save.userId')
Utility Nodes
_requiredFields
Nodes may use the _requiredFields
node to reflect on what member variables are expected of them from within a given Builder
. If a node is only ever referenced via member variables, _requiredFields
will return an array of required member names. Otherwise, _requiredFields
will return '*' to signify that the entire object is being asked for:
function getUser(userId, _requiredFields) {
var user
if (_requiredFields == '*') {
} else {
}
return user
}
graph.add('user-byUserId', getUser, ['userId', '_requiredFields'])
Contributing
Questions, comments, bug reports, and pull requests are all welcome.
Submit them at the project on GitHub.
Bug reports that include steps-to-reproduce (including code) are the
best. Even better, make them in the form of pull requests that update
the test suite. Thanks!
Author
Jeremy Stanley
supported by
The Obvious Corporation.
License
Copyright 2012 The Obvious Corporation.
Licensed under the Apache License, Version 2.0.
See the top-level file LICENSE.TXT
and
(http://www.apache.org/licenses/LICENSE-2.0).