Is this algorithm linear?

后端未结

关注

 2  583

Inspired by these two questions: String manipulation: calculate the "similarity of a string with its suffixes" and Program execution varies as the I/P size increas

相关标签:

2条回答

無奈伤痛

2020-12-28 13:48
You might want to have a look at the Z-algorithm, that's provably linear:

s is a C-string of length N
```
Z[0] = N;
int a = 0, b = 0;
for (int i = 1; i < N; ++i)
{
  int k = i < b ? min(b - i, Z[i - a]) : 0;
  while (i + k < N && s[i + k] == s[k]) ++k;
    Z[i] = k;
  if (i + k > b) { a = i; b = i + k; }
}
```
Now similarity is just the sum of entries of Z.
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2020-12-28 14:07
This looks like a really neat idea, but sadly I believe the worst case behaviour is O(n^2).

Here is my attempt at a counterexample. (I'm not a mathematician so please forgive my use of Python instead of equations to express my ideas!)

Consider the string with 4K+1 symbols
```
s = 'a'*K+'X'+'a'*3*K
```
This will have
```
borders[1:] = range(K)*2+[K]*(2*K+1)

ne_borders[1:] = [-1]*(K-1)+[K-1]+[-1]*K+[K]*(2*K+1)
```
Note that:

1) ne_borders[i] will equal K for (2K+1) values of i.

2) for 0<=j<=K, borders[j]=j-1

3) the final loop in your algorithm will go into the inner loop with j==K for 2K+1 values of i

4) the inner loop will iterate K times to reduce j to 0

5) This results in the algorithm needing more than N*N/8 operations to do a worst case string of length N.

For example, for K=4 it goes round the inner loop 39 times
```
s = 'aaaaXaaaaaaaaaaaa'
borders[1:] = [0, 1, 2, 3, 0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4]
ne_borders[1:] = [-1, -1, -1, 3, -1, -1, -1, -1, 4, 4, 4, 4, 4, 4, 4, 4, 4]
```
For K=2,248 it goes round the inner loop 10,111,503 times!

Perhaps there is a way to fix the algorithm for this case?
0 讨论(0)
发布评论:

提交评论
- 加载中...