Kahn proposed an algorithm in 62 to topologically sort any DAG (directed acyclic graph), pseudo code copied from Wikipedia:
L ← Empty list that will contain the
I'm going to suggest a less literal implementation of the algorithm: you don't need to manipulate the DAG at all, you just need to manipulate info about the DAG. The only "interesting" things the algorithm needs are a mapping from a node to its children (the opposite of what your DAG actually stores), and a count of the number of each node's parents.
These are easy to compute, and dicts can be used to associate this info with node names (assuming all names are distinct - if not, you can invent unique names with a bit more code).
Then this should work:
def topsort(dag):
name2node = {node.name: node for node in dag.nodes}
# map name to number of predecessors (parents)
name2npreds = {}
# map name to list of successors (children)
name2succs = {name: [] for name in name2node}
for node in dag.nodes:
thisname = node.name
name2npreds[thisname] = len(node.parents)
for p in node.parents:
name2succs[p.name].append(thisname)
result = [n for n, npreds in name2npreds.items() if npreds == 0]
for p in result:
for c in name2succs[p]:
npreds = name2npreds[c]
assert npreds
npreds -= 1
name2npreds[c] = npreds
if npreds == 0:
result.append(c)
if len(result) < len(name2node):
raise ValueError("no topsort - cycle")
return tuple(name2node[p] for p in result)
There's one subtle point here: the outer loop appends to result
while it's iterating over result
. That's intentional. The effect is that every element in result
is processed exactly once by the outer loop, regardless of whether an element was in the initial result
or added later.
Note that while the input DAG
and Node
s are traversed, nothing in them is altered.