Levenshtein distance with bound/limit

╄→гoц情女王★ 提交于 2020-03-14 19:06:09

问题


I have found some Python implementations of the Levenshtein distance.

I am wondering though how these algorithms can be efficiently modified so that they break if the Levenshtein distance is greater than n (e.g. 3) instead of running until the end?

So essentially I do not want to let the algorithm run for too long to calculate the final distance if I simply want to know if the distance is greater than a threshold or not.

I have found some relevant posts here:

  1. Modifying Levenshtein Distance algorithm to not calculate all distances
  2. Levenstein distance limit
  3. Most efficient way to calculate Levenshtein distance
  4. Levenshtein Distance Algorithm better than O(n*m)?

but still, I do not see any Python code which does what I describe above (which is more or less what these posts describe too).

P.S.: The solution provided by @amirouche below is based on the fastest implementation that I have tested with some benchmarking (from here: https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python, https://stackoverflow.com/a/32558749/9024698) and its bounded version is the fastest one of its kind from my tests (without excluding that there may be even faster ones).


回答1:


As described in Levenstein distance limit, you can add a test over the row that is computed to return early:

def levenshtein(s1, s2, maximum):
    if len(s1) > len(s2):
        s1, s2 = s2, s1

    distances = range(len(s1) + 1)
    for i2, c2 in enumerate(s2):
        distances_ = [i2+1]
        for i1, c1 in enumerate(s1):
            if c1 == c2:
                distances_.append(distances[i1])
            else:
                distances_.append(1 + min((distances[i1], distances[i1 + 1], distances_[-1])))
        if all((x >= maximum for x in distances_)):
            return False
        distances = distances_
    return distances[-1]


来源:https://stackoverflow.com/questions/59686989/levenshtein-distance-with-bound-limit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!