Longest Common Prefixes

限于喜欢 提交于 2019-12-06 13:05:01

Based on your comment, I'll assume we have access to the suffix array SA as well as the standard LCP array, i.e. a data structure that tells us, at index i>0, what the length of the longest common prefix of suffix SA[i] and its lexicographic predecessor SA[i-1] is.

I'll use the letter L to refer to the special LCP array we want to construct, as described in the question. I'll use the letter N to refer to the length of the input string str.

Then what we can do is this:

  1. Determine the position of str within the suffix array. We can do this by screening SA linearly to find the entry 0. (Explanation: str is the suffix of str starting at position 0. Therefore, 0 must appear as an entry of the suffix array.)

  2. Suppose the entry we find is at index k. Then we can set L[k]:=N, we because SA[k] is the string itself and has a prefix of N characters in common with itself.

  3. Then we can set L[k-1]:=LCP[k] and L[k+1]:=LCP[k+1] because that is how the standard LCP is defined.

  4. Then we go backwards from i:=k-2 down to 0 and set

    L[i] := min(LCP[i+1],L[i+1])
    

    This works because, at each iteration i, LCP[i+1] tells us the longest common prefix of the adjacent suffixes SA[i] and SA[i+1], and L[i+1] tells us the longest common prefix of the previously processed suffix SA[i+1] and the input string str. L[i] must be the minimum of those two, because L[i] indicates how long a prefix SA[i] has in common with str, and that cannot be longer than the prefix it has in common with SA[i+1], otherwise its position in the suffix array would be closer to k.

  5. We also count forward from i:=k+2 to N and set

    L[i] := min(LCP[i],L[i-1])
    

    based on the same reasoning.

Then all N values of L have been set, and it took no more than O(N) time, assuming that random access to the arrays and integer comparison are O(1), respectively.

Since the array we compute is N entries in length, a complexity of O(N) is optimal.

(Note. You can start the loops in steps 4 and 5 at k-1 and k+1, respectively, and get rid of step 3. The extra step only serves to make the explanation -- hopefully -- a little easier to follow.)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!