deceptively simple implementation of topological sorting in python

后端 未结 3 1108
情书的邮戳
情书的邮戳 2020-12-08 07:21

Extracted from here we got a minimal iterative dfs routine, i call it minimal because you can hardly simplify the code further:

def iterative_dfs(graph, star         


        
相关标签:
3条回答
  • 2020-12-08 08:16

    I was also trying to simplify this so I came up with this:

    from collections import deque
    
    def dfs(graph, source, stack, visited):
        visited.add(source)
    
        for neighbour in graph[source]:
            if neighbour not in visited:
                dfs(graph, neighbour, stack, visited)
        
        stack.appendleft(source)
    
    def topological_sort_of(graph):
        stack = deque()
        visited = set()
    
        for vertex in graph.keys():
            if vertex not in visited:
                dfs(graph, vertex, stack, visited)
    
        return stack
    
    if __name__ == "__main__":
        graph = {
            0: [1, 2],
            1: [2, 5],
            2: [3],
            3: [],
            4: [],
            5: [3, 4],
            6: [1, 5],
        }
    
        topological_sort = topological_sort_of(graph)
        print(topological_sort)
    

    Function dfs (Depth First Search) is used to create the stack of finishing times for every vertex in the graph. Finishing time here means that the element pushed into the stack first, is the first vertex where all of its neighbours are fully explored (no other unvisited neighbours are available to explore from that vertex) and the last element pushed into the stack is the last vertex where all of its neighbours are fully explored.

    The stack is now simply the topological sort.

    Using a Python set for visited provides constant membership checking and using deque as a stack provides constant-time left insertion as well.

    The high-level idea was inspired by CLRS [1].

    [1] Cormen, Thomas H., et al. Introduction to algorithms. MIT Press, 2009.

    0 讨论(0)
  • 2020-12-08 08:21

    My idea is based on two key observations:

    1. Don't pop the next item from stack, keep it to emulate stack unwinding.
    2. Instead of pushing all children to stack, just push one.

    Both of these help us to traverse the graph exactly like recursive dfs. As the other answer here noted, this is important for this particular problem. The rest should be easy.

    def iterative_topological_sort(graph, start,path=set()):
        q = [start]
        ans = []
        while q:
            v = q[-1]                   #item 1,just access, don't pop
            path = path.union({v})  
            children = [x for x in graph[v] if x not in path]    
            if not children:              #no child or all of them already visited
                ans = [v]+ans 
                q.pop()
            else: q.append(children[0])   #item 2, push just one child
    
        return ans
    

    q here is our stack. In the main loop, we 'access' our current node v from the stack. 'access', not 'pop', because we need to be able to come back to this node again. We find out all unvisited children of our current node. and push only the first one to stack (q.append(children[0])), not all of them together. Again, this is precisely what we do with recursive dfs.

    If no eligible child is found (if not children), we have visited the entire subtree under it. So it's ready to be pushed into ans. And this is when we really pop it.

    (Goes without saying, it's not great performance-wise. Instead of generating all unvisited children in children variable, we should just generate the first one, generator style, maybe using filter. We should also reverse that ans = [v] + ans and call a reverse on ans at the end. But these things are omitted for OP's insistence on simplicity.)

    0 讨论(0)
  • 2020-12-08 08:23

    It's not easy to turn an iterative implementation of DFS into Topological sort, since the change that needs to be done is more natural with a recursive implementation. But you can still do it, it just requires that you implement your own stack.

    First off, here's a slightly improved version of your code (it's much more efficient and not much more complicated):

    def iterative_dfs_improved(graph, start):
        seen = set()  # efficient set to look up nodes in
        path = []     # there was no good reason for this to be an argument in your code
        q = [start]
        while q:
            v = q.pop()   # no reason not to pop from the end, where it's fast
            if v not in seen:
                seen.add(v)
                path.append(v)
                q.extend(graph[v]) # this will add the nodes in a slightly different order
                                   # if you want the same order, use reversed(graph[v])
    
        return path
    

    Here's how I'd modify that code to do a topological sort:

    def iterative_topological_sort(graph, start):
        seen = set()
        stack = []    # path variable is gone, stack and order are new
        order = []    # order will be in reverse order at first
        q = [start]
        while q:
            v = q.pop()
            if v not in seen:
                seen.add(v) # no need to append to path any more
                q.extend(graph[v])
    
                while stack and v not in graph[stack[-1]]: # new stuff here!
                    order.append(stack.pop())
                stack.append(v)
    
        return stack + order[::-1]   # new return value!
    

    The part I commented with "new stuff here" is the part that figures out the order as you move up the stack. It checks if the new node that's been found is a child of the previous node (which is on the top of the stack). If not, it pops the top of the stack and adds the value to order. While we're doing the DFS, order will be in reverse topological order, starting from the last values. We reverse it at the end of the function, and concatenate it with the remaining values on the stack (which conveniently are already in the correct order).

    Because this code needs to check v not in graph[stack[-1]] a bunch of times, it will be much more efficient if the values in the graph dictionary are sets, rather than lists. A graph usually doesn't care about the order its edges are saved in, so making such a change shouldn't cause problems with most other algorithms, though code that produces or updates the graph might need fixing. If you ever intend to extend your graph code to support weighted graphs, you'll probably end up changing the lists to dictionaries mapping from node to weight anyway, and that would work just as well for this code (dictionary lookups are O(1) just like set lookups). Alternatively, we could build the sets we need ourselves, if graph can't be modified directly.

    For reference, here's a recursive version of DFS, and a modification of it to do a topological sort. The modification needed is very small indeed:

    def recursive_dfs(graph, node):
        result = []
        seen = set()
    
        def recursive_helper(node):
            for neighbor in graph[node]:
                if neighbor not in seen:
                    result.append(neighbor)     # this line will be replaced below
                    seen.add(neighbor)
                    recursive_helper(neighbor)
    
        recursive_helper(node)
        return result
    
    def recursive_topological_sort(graph, node):
        result = []
        seen = set()
    
        def recursive_helper(node):
            for neighbor in graph[node]:
                if neighbor not in seen:
                    seen.add(neighbor)
                    recursive_helper(neighbor)
            result.insert(0, node)              # this line replaces the result.append line
    
        recursive_helper(node)
        return result
    

    That's it! One line gets removed and a similar one gets added at a different location. If you care about performance, you should probably do result.append in the second helper function too, and do return result[::-1] in the top level recursive_topological_sort function. But using insert(0, ...) is a more minimal change.

    Its also worth noting that if you want a topological order of the whole graph, you shouldn't need to specify a starting node. Indeed, there may not be a single node that lets you traverse the entire graph, so you may need to do several traversals to get to everything. An easy way to make that happen in the iterative topological sort is to initialize q to list(graph) (a list of all the graph's keys) instead of a list with only a single starting node. For the recursive version, replace the call to recursive_helper(node) with a loop that calls the helper function on every node in the graph if it's not yet in seen.

    0 讨论(0)
提交回复
热议问题