问题
Given two strings of equal length, Levenshtein distance allows to find the minimum number of transformations necessary to get the second string, given the first. However, I'd like to find a way to adjust the alogrithm for multiple pairs of strings, given that they were all generated in the same way.
回答1:
Reading the comments, it appears that this is the problem:
You are given a set of pairs of strings, all the same length and each pair is the input to some function paired with the output from the function. So, for the pair A,B, we know that f(A)=B. The goal is to reverse engineer f() with a large set of A,B pairs.
Using Levenshtein distance on the entire set will, at most, tell you the maximum number of transformations that must take place.
A better start would be Hamming distance (modified to allow multiple characters) or Jaccard similarity to identify how many positions in strings do not change at all for all of the pairs. Then, you are left only with those that do change.
This will fail if the letters shift.
To detect shift, you want to use global alignment (Needleman-Wunsch). You will then see something like "ABCDE"=>"xABCD"
to show that from the input to the output, there was a left shift.
Overall, I feel that Levenshtein distance will do very little to help you get at the original algorithm.
来源:https://stackoverflow.com/questions/4809525/how-to-compute-multiple-related-levenshtein-distances