Can this breadth-first search be made faster?

后端 未结 4 2085
野的像风
野的像风 2021-02-09 23:28

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually abou

相关标签:
4条回答
  • 2021-02-09 23:33

    I'll bet that machine has more than one core, doesn't it? Run it in parallel.

    Python Threading

    0 讨论(0)
  • 2021-02-09 23:34

    Hmm, doesn't BFS involve marking nodes you've already seen so you don't visit them again?

    0 讨论(0)
  • 2021-02-09 23:37

    Something like this:

    #!/usr/bin/env python
    
    from Queue import Queue
    
    def traverse_path(fromNode, toNode, nodes):
        def getNeighbours(current, nodes):
            return nodes[current] if current in nodes else []
    
        def make_path(toNode, graph):
            result = []
            while 'Root' != toNode:
                result.append(toNode)
                toNode = graph[toNode]
            result.reverse()
            return result
    
        q = Queue()
        q.put(fromNode)
        graph = {fromNode: 'Root'}
    
        while not q.empty():
            # get the next node and add its neighbours to queue
            current = q.get()
            for neighbor in getNeighbours(current, nodes):
                # use neighbor only continue if not already visited
                if neighbor not in graph:
                    graph[neighbor] = current
                    q.put(neighbor)
    
            # check if destination
            if current == toNode:
                return make_path(toNode, graph)
        return []
    
    if __name__ == '__main__':
        nodes = {
            'E1123': ['D111', 'D222', 'D333', 'D444'],
            'D111': ['C01', 'C02', 'C04'],
            'D222': ['C11', 'C03', 'C05'],
            'D333': ['C01'],
            'C02': ['B1'],
            'B1': ['A3455']
        }
        result = traverse_path('E1123', 'A3455', nodes)
        print result
    
    ['E1123', 'D111', 'C02', 'B1', 'A3455']
    

    If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

    0 讨论(0)
  • 2021-02-09 23:58

    Well, given the upvotes on the comment, I'll make it an answer now.

    The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

    If you work with that data in memory, you will see a significant performance gain.

    0 讨论(0)
提交回复
热议问题