Interpolation search on strings

后端 未结 1 1210
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-04 19:14

For those of you not familiar with interpolation search, it is method to search for a value in a sorted array that is potentially faster than binary search. You look at the firs

1条回答
  •  名媛妹妹
    2021-02-04 19:55

    To find the "distance" between two strings, a simple method would be to look at the first letter that is different between them and assign a numeric value to each, then take the difference.

    For example, the distance from "a" to "y" would be 24 and the distance from "y" to "z" would be 1, if each letter were assigned a value equal to its position in the alphabet.

    A better performing method would go through a dictionary to weight the various letters by how common they are in actual words.

    Another refinement would be to look at two characters - "aa" is farther from "bz" than "az" is from "ba", for example. Going beyond two characters wouldn't buy you much.

    The reason this method isn't more popular is that it complicates the binary search algorithm for not a lot of gain. If you were to time it you might even find that standard binary search is faster; what you gain in fewer comparisons you lose in the complexity of determining distances.

    Also note that the worst-case performance of this algorithm is worse than a binary search. Consider for example searching for "ae" in the list of "aa","ab","ac","ad","ae","zz" - the outlier "zz" is going to bias the search so that it's always trying the beginning of the search range. It degrades to O(n) under these conditions.

    0 讨论(0)
提交回复
热议问题