Time complexity of this algorithm: Word Ladder

问题

Question:

Given two words (beginWord and endWord), and a dictionary's word list, find all shortest transformation sequence(s) from beginWord to endWord, such that:

Only one letter can be changed at a time. Each transformed word must exist in the word list. Note that beginWord is not a transformed word.

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]

Output: [ ["hit","hot","dot","dog","cog"], ["hit","hot","lot","log","cog"] ]

My solution is based on this idea, but how do I analyze the time and space complexity of this solution?

1) Perform a BFS starting at beginWord by transforming every letter to one of 26 letters, and see if the transformed word is in the wordList, if so, put in queue.

2) During BFS, maintain a graph of {word:nextWord} for all valid next words

3) When a nextWord reaches endWord, do a backtracking DFS (pre-order traversal) on the graph to get all paths.

class Solution:
    def findLadders(self, beginWord, endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: List[str]
        :rtype: List[List[str]]
        """
        wordListSet = set(wordList+[beginWord])
        graph = collections.defaultdict(list)
        q = set([beginWord])    
        count = 0
        result = []
        while q:
            count +=1
            newQ = set()
            for word in q:
                wordListSet.remove(word)
            for word in q:
                if word == endWord:                                        
                    self.getAllPaths(graph, beginWord, endWord, result, [])
                    return result
                for i in range(len(word)):
                    for sub in 'abcdefghijklmnopqrstuvwxyz':
                        if sub != word[i]:
                            newWord = word[:i] + sub + word[i+1:]
                            if newWord in wordListSet:
                                graph[word].append(newWord)
                                newQ.add(newWord)
            q = newQ
        return []

    def getAllPaths(self, graph, node, target, result, output):
        #This is just a backtracking pre-order traversal DFS on a DAG.
        output.append(node)
        if node==target:
            result.append(output[:])
        else:
            for child in graph[node]:
                self.getAllPaths(graph,child, target, result, output)
                output.pop()

I have a hard time coming up with the time and space complexity of it. My contention:

Time: O(26*L*N + N), where L is average length of each word, and N is the number of words in the wordList. Worst case here is every word transformed happens to be in the list, so each transformation needs 26 * length of word. The DFS part is just O(N). So asymptotically it's just O(L*N)

Space: O(N)

回答1:

You won't find all simple paths because there might be alternative shortest paths to the end word. The simplest counterexample is as follows:

beginWord = aa,
endWord = bb
wordList = [aa, ab, ba, bb]

Your algorithm would miss the path aa -> ba -> bb. In fact, it will always find at most one path.

The time complexity is indeed O(L * N) as you wrote but the space complexity is O(L*N) which is the space that your graph or wordList occupies.

回答2:

This sounds like a fun problem. Yes, the answer is O(L * N). If you fixed your code to return all solutions, the recursive print routine is O(L!).

You have have this outer loop, for all nodes being considered. This can be equal to the length of your wordlist. Consider the fully connected set of three letter combinations ['aaa', 'aab', ... 'zzz']. The node count is 26^3, or 27576. Transforming from aaa to zzz has six answers: aaa->zaa->zza->zzz, aaa->zaa->aza->zzz, aaa->aza->zza->zzz, etc. You would be considering all length three paths, (25+25+25)(25+25)(25) or 93,750 paths to be sure there wasn't a shorter path.
You have two choices for the inner loop: for i in range(len(word)) and your recursive call to get_all_paths() to list all the paths. You know you have an order of length_of_word for the inner, implying O(L * N). Note that O(L * N * 26) means the same thing; big O notation only cares about the scale of changes. I haven't proved you stay linear on that get_all_paths loop.
This is a special case of Dijkstra's Shortest Path. You can do better adding a heuristic to your specific problem. The total path length through a node is always greater than or equal to the distance so far plus the number of letters still wrong. That means, in the fully connected case, you have aaa (0 length)->aab (1)->abb (2)->bbb (3) so you avoid exploring aaa (0 actual + 3 heuristic) -> aab (1 actual + 3 heuristic).
You can correct your code to return all the word ladders, and I did so here. The problem is that the recursive getAllPaths() routine now grows faster then O(L * N). In the code sample, an input has two sets of "path choices", or subgraphs, set of which multiplies the number of paths. So, tripling the number of nodes would triple the number of path choices, cubing the number of path choices.

回答3:

the answer should be O(L^2 * n)

In the process of building a new word, it costs O(L^2) in total. Firstly we loop the current word, that costs O(L); then for building each new string: newWord = word[:i] + sub + word[i+1:], this cost another O(L)

来源：https://stackoverflow.com/questions/53075364/time-complexity-of-this-algorithm-word-ladder

标签

python

time-complexity

breadth-first-search