Levenshtein Distance Algorithm better than O(n*m)?

前端 未结 4 623
说谎
说谎 2020-11-28 22:49

I have been looking for an advanced levenshtein distance algorithm, and the best I have found so far is O(n*m) where n and m are the lengths of the two strings. The reason w

相关标签:
4条回答
  • 2020-11-28 22:57

    Are you interested in reducing the time complexity or the space complexity ? The average time complexity can be reduced O(n + d^2), where n is the length of the longer string and d is the edit distance. If you are only interested in the edit distance and not interested in reconstructing the edit sequence, you only need to keep the last two rows of the matrix in memory, so that will be order(n).

    If you can afford to approximate, there are poly-logarithmic approximations.

    For the O(n +d^2) algorithm look for Ukkonen's optimization or its enhancement Enhanced Ukkonen. The best approximation that I know of is this one by Andoni, Krauthgamer, Onak

    0 讨论(0)
  • 2020-11-28 23:09

    I found another optimization that claims to be O(max(m, n)):

    http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C

    (the second C implementation)

    0 讨论(0)
  • 2020-11-28 23:13

    If you only want the threshold function - eg, to test if the distance is under a certain threshold - you can reduce the time and space complexity by only calculating the n values either side of the main diagonal in the array. You can also use Levenshtein Automata to evaluate many words against a single base word in O(n) time - and the construction of the automatons can be done in O(m) time, too.

    0 讨论(0)
  • 2020-11-28 23:15

    Look in Wiki - they have some ideas to improve this algorithm to better space complexity:

    Wiki-Link: Levenshtein distance

    Quoting:

    We can adapt the algorithm to use less space, O(m) instead of O(mn), since it only requires that the previous row and current row be stored at any one time.

    0 讨论(0)
提交回复
热议问题