@vltpkg/graph-run
Figure out which items in a dependency graph can be handled in
parallel, and which must be deferred until their dependencies are
completed. Handles cyclic and acyclic graphs, but if cycles are
allowed, then dependency order may not be guaranteed.
For example, given the following directed graph:
flowchart TD
A --> B
A --> C
B --> E
B --> D
C --> D
C --> G
D --> F
F --> E
it will allow you to operate in this order:
- E and G in parallel (since they are leaf nodes)
- F when E is complete
- D when F is complete
- B when D and E are complete (possibly in parallel with C)
- C when D and G are complete (possibly in parallel with B)
- A when B and C are complete
Operation will be maximally parallelized, so that each step in the
graph only waits until its dependencies are completed.
Cycles
If a cycle exists in the graph, then at least one node in the cycle
will by necessity be operated upon before its dependencies are
complete.
Cycles may be disallowed by throwing in the onCycle method provided.
Contract
- Each node reachable from the initial entry points provided will be
visited exactly once.
- Except in the case of cycles, each node's dependencies will be
visited before itself.
- In the default async forms of the methods provided:
- All methods will be awaited before continuing.
- Any operations that can be done in parallel will be.
- All pending visits and dependency lookups will be skipped if a
provided
AbortSignal fires.
Difference between this and typical topological sort
This is not just a topological sort, and there are better libraries
available that implement Kahn's Topological Sort very efficiently.
However, topological sorting is insufficient if you wish to know which
items can be operated on in parallel, and Kahn's algorithm only
works when the graph is guaranteed to be acyclic and all nodes are
known ahead of time.
This implementation does not require that the graph be entirely loaded
ahead of time, and provides a mechanism to treat cycles as a warning
rather than an error, with the caveat that if cycles are allowed then
topological ordering is of course not guaranteed.
Caveat
While this can be used in theory to explore any infinitely large
graph, note that the result set will be stored in memory.
So, if you were to use it to try to crawl all the links in Wikipedia
or something, you're going to have a Bad Time if the result set gets
too big.
USAGE
import {
graphRun,
allSettled,
any,
race,
graphRunSync,
allSettledSync,
anySync,
raceSync,
} from '@vltpk/graph-run'
const ac = new AbortController()
const results = await graphRun({
graph: [some, nodes, known, at, the, start],
getDeps: async (n) => {
return [the, dependency, node, objects]
},
visit: async (node, signal, path) => {
await doSomething(n)
},
onCycle: (node, cycle, path) => {
console.error(
`warning: while evaluating ${
node
} at path ${
path.join('->')
} encountered cycle: ${
cycle.join('->')
}. Proceeding without this dependency.`
)
},
failFast: true
signal: ac.signal
})
const isDAG = (graph) => {
try {
graphRunSync({
graph: [graph],
getDeps: n => n.children ?? [],
onCycle: () => { throw 'cycle detected' },
visit: () => {}
})
return true
} catch {
return false
}
}
const g = {
children: [
{
name: 'a',
children: [{ name: 'b' }],
}
]
}
console.log(isDAG(g))
g.children[0].children[0].children = g.children
console.log(isDAG(g))
Other Options
How This Is Different
- More flexible than most as far as the data structures used. Ie,
Node can be any value type.
- Dependency edges are resolved on-demand, and may be calculated
asynchronously, making it efficient in cases where loading the
entire graph may be expensive.
- Both cyclic and acyclic graphs are supported.