Can this breadth-first search be made faster?

后端未结

关注

 4  2095

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually abou

相关标签:

4条回答

轮回少年

2021-02-09 23:33

I'll bet that machine has more than one core, doesn't it? Run it in parallel.

Python Threading

0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2021-02-09 23:34

Hmm, doesn't BFS involve marking nodes you've already seen so you don't visit them again?

0 讨论(0)
发布评论:

提交评论
- 加载中...

轻奢々

2021-02-09 23:37

Something like this:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

0 讨论(0)

北海茫月

2021-02-09 23:58

Well, given the upvotes on the comment, I'll make it an answer now.

The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

If you work with that data in memory, you will see a significant performance gain.

0 讨论(0)
发布评论:

提交评论
- 加载中...