Implementing a simple Trie for efficient Levenshtein Distance calculation - Java

后端 未结 11 769
不思量自难忘°
不思量自难忘° 2020-12-22 18:51

UPDATE 3

Done. Below is the code that finally passed all of my tests. Again, this is modeled after Murilo Vasconcelo\'s modified version of Steve

11条回答
  •  隐瞒了意图╮
    2020-12-22 19:37

    In many ways, Steve Hanov's algorithm (presented in the first article linked in the question, Fast and Easy Levenshtein distance using a Trie), the ports of the algorithm made by Murilo and you (OP), and quite possibly every pertinent algorithm involving a Trie or similar structure, function much like a Levenshtein Automaton (which has been mentioned several times here) does:

    Given:
           dict is a dictionary represented as a DFA (ex. trie or dawg)
           dictState is a state in dict
           dictStartState is the start state in dict
           dictAcceptState is a dictState arrived at after following the transitions defined by a word in dict
           editDistance is an edit distance
           laWord is a word
           la is a Levenshtein Automaton defined for laWord and editDistance
           laState is a state in la
           laStartState is the start state in la
           laAcceptState is a laState arrived at after following the transitions defined by a word that is within editDistance of laWord
           charSequence is a sequence of chars
           traversalDataStack is a stack of (dictState, laState, charSequence) tuples
    
    Define dictState as dictStartState
    Define laState as laStartState
    Push (dictState, laState, "") on to traversalDataStack
    While traversalDataStack is not empty
        Define currentTraversalDataTuple as the the product of a pop of traversalDataStack
        Define currentDictState as the dictState in currentTraversalDataTuple
        Define currentLAState as the laState in currentTraversalDataTuple
        Define currentCharSequence as the charSequence in currentTraversalDataTuple
        For each char in alphabet
            Check if currentDictState has outgoing transition labeled by char
            Check if currentLAState has outgoing transition labeled by char
            If both currentDictState and currentLAState have outgoing transitions labeled by char
                Define newDictState as the state arrived at after following the outgoing transition of dictState labeled by char
                Define newLAState as the state arrived at after following the outgoing transition of laState labeled by char
                Define newCharSequence as concatenation of currentCharSequence and char
                Push (newDictState, newLAState, newCharSequence) on to currentTraversalDataTuple
                If newDictState is a dictAcceptState, and if newLAState is a laAcceptState
                    Add newCharSequence to resultSet
                endIf
            endIf
        endFor
    endWhile
    

    Steve Hanov's algorithm and its aforementioned derivatives obviously use a Levenshtein distance computation matrix in place of a formal Levenshtein Automaton. Pretty fast, but a formal Levenshtein Automaton can have its parametric states (abstract states which describe the concrete states of the automaton) generated and used for traversal, bypassing any edit-distance-related runtime computation whatsoever. So, it should be run even faster than the aforementioned algorithms.

    If you (or anybody else) is interested in a formal Levenshtein Automaton solution, have a look at LevenshteinAutomaton. It implements the aforementioned parametric-state-based algorithm, as well as a pure concrete-state-traversal-based algorithm (outlined above) and dynamic-programming-based algorithms (for both edit distance and neighbor determination). It's maintained by yours truly :) .

提交回复
热议问题