Efficient algorithm to find all the paths from A to Z?

前端 未结 4 1292
失恋的感觉
失恋的感觉 2020-12-23 15:01

With a set of random inputs like this (20k lines):

A B
U Z
B A
A C
Z A
K Z
A Q
D A
U K
P U
U P
B Y
Y R
Y U
C R
R Q
A D
Q Z

Find all the pat

相关标签:
4条回答
  • 2020-12-23 15:23

    Your data is essentially an adjacency list which allows you to construct a tree rooted at the node corresponding to A. In order to obtain all the paths between A & Z, you can run any tree traversal algorithm.

    Of course, when you're building the tree you have to ensure that you don't introduce cycles.

    0 讨论(0)
  • 2020-12-23 15:38

    I would proceed recursively where I would build a list of all possible paths between all pairs of nodes.

    I would start by building, for all pairs (X, Y), the list L_2(X, Y) which is the list of paths of length 2 that go from X to Y; that's trivial to build since that's the input list you are given.

    Then I would build the lists L_3(X, Y), recursively, using the known lists L_2(X, Z) and L_2(Z, Y), looping over Z. For example, for (C, Q), you have to try all Z in L_2(C, Z) and L_2(Z, Q) and in this case Z can only be R and you get L_3(C, Q) = {C -> R -> Q}. For other pairs, you might have an empty L_3(X, Y), or there could be many paths of length 3 from X to Y. However you have to be careful here when building the paths here since some of them must be rejected because they have cycles. If a path has twice the same node, it is rejected.

    Then you build L_4(X, Y) for all pairs by combining all paths L_2(X, Z) and L_3(Z, Y) while looping over all possible values for Z. You still remove paths with cycles.

    And so on... until you get to L_17576(X, Y).

    One worry with this method is that you might run out of memory to store those lists. Note however that after having computed the L_4's, you can get rid of the L_3's, etc. Of course you don't want to delete L_3(A, Z) since those paths are valid paths from A to Z.

    Implementation detail: you could put L_3(X, Y) in a 17576 x 17576 array, where the element at (X, Y) is is some structure that stores all paths between (X, Y). However if most elements are empty (no paths), you could use instead a HashMap<Pair, Set<Path>>, where Pair is just some object that stores (X, Y). It's not clear to me if most elements of L_3(X, Y) are empty, and if so, if it is also the case for L_4334(X, Y).

    Thanks to @Lie Ryan for pointing out this identical question on mathoverflow. My solution is basically the one by MRA; Huang claims it's not valid, but by removing the paths with duplicate nodes, I think my solution is fine.

    I guess my solution needs less computations than the brute force approach, however it requires more memory. So much so that I'm not even sure it is possible on a computer with a reasonable amount of memory.

    0 讨论(0)
  • 2020-12-23 15:45

    What you're proposing is a scheme for DFS, only with backtracking.It's correct, unless you want to permit cyclic paths (you didn't specify if you do).

    There are two gotchas, though.

    1. You have to keep an eye on nodes you already visited on current path (to eliminate cycles)
    2. You have to know how to select next node when backtracking, so that you don't descend on the same subtree in the graph when you already visited it on the current path.

    The pseudocode is more or less as follows:

    getPaths(A, current_path) :
        if (A is destination node): return [current_path]
        for B = next-not-visited-neighbor(A) : 
            if (not B already on current path) 
                result = result + getPaths(B, current_path + B)
        return result 
    
     list_of_paths =  getPaths(A, [A])
    

    which is almost what you said.

    Be careful though, as finding all paths in complete graph is pretty time and memory consuming.

    edit For clarification, the algorithm has Ω(n!) time complexity in worst case, as it has to list all paths from one vertex to another in complete graph of size n, and there are at least (n-2)! paths of form <A, permutations of all nodes except A and Z, Z>. No way to make it better if only listing the result would take as much.

    0 讨论(0)
  • 2020-12-23 15:48

    As I understand your question, Dijkstras algorithm cannot be applied as is, since shortest path problem per definition finds a single path in a set of all possible paths. Your task is to find all paths per-se.

    Many optimizations on Dijkstras algorithm involve cutting off search trees with higher costs. You won't be able to cut off those parts in your search, as you need all findings.

    And I assume you mean all paths excluding circles.

    Algorithm:

    • Pump network into a 2dim array 26x26 of boolean/integer. fromTo[i,j]. Set a 1/true for an existing link.

    • Starting from the first node trace all following nodes (search links for 1/true).

    • Keep visited nodes in a some structure (array/list). Since maximal depth seems to be 26, this should be possible via recursion.

    • And as @soulcheck has written below, you may think about cutting of paths you have aleady visted. You may keep a list of paths towards the destination in each element of the array. Adjust the breaking condition accordingly.

    • Break when

      • visiting the end node (store the result)
      • when visiting a node that has been visted before (circle)
      • visiting a node for which you have already found all paths to the destination and merge your current path with all the existing ones from that node.

    Performance wise I'd vote against using hashmaps and lists and prefer static structures.

    Hmm, while re-reading the question, I realized that the name of the nodes cannot be limited to A-Z. You are writing something about 20k lines, with 26 letters, a fully connected A-Z network would require far less links. Maybe you skip recursion and static structures :)

    Ok, with valid names from AAA to ZZZ an array would become far too large. So you better create a dynamic structure for the network as well. Counter question: regarding performance, what is the best data structure for a less popuplate array as my algorithm would require? I' vote for an 2 dim ArrayList. Anyone?

    0 讨论(0)
提交回复
热议问题