Rabin Karp Algorithm - How is the worst case O(m*n) for the given input?

问题

In the Top Coder's code of RK algorithm:

// correctly calculates a mod b even if a < 0
function int_mod(int a, int b)
{
  return (a % b + b) % b;
}

function Rabin_Karp(text[], pattern[])
{
  // let n be the size of the text, m the size of the
  // pattern, B - the base of the numeral system,
  // and M - a big enough prime number

  if(n < m) return; // no match is possible

  // calculate the hash value of the pattern
  hp = 0;
  for(i = 0; i < m; i++) 
    hp = int_mod(hp * B + pattern[i], M);

  // calculate the hash value of the first segment 
  // of the text of length m
  ht = 0;
  for(i = 0; i < m; i++) 
    ht = int_mod(ht * B + text[i], M);

  if(ht == hp) check character by character if the first
               segment of the text matches the pattern;

  // start the "rolling hash" - for every next character in
  // the text calculate the hash value of the new segment
  // of length m; E = (Bm-1) modulo M            
  for(i = m; i < n; i++) {
    ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
    ht = int_mod(ht * B, M);
    ht = int_mod(ht + text[i], M);

    if(ht == hp) check character by character if the
                 current segment of the text matches
                 the pattern; 
  }
}

It is written that

Unfortunately, there are still cases when we will have to run the entire inner loop of the “naive” method for every starting position in the text – for example, when searching for the pattern “aaa” in the string “aaaaaaaaaaaaaaaaaaaaaaaaa” — so in the worst case we will still need (n * m) iterations.

But won't the algorithm stop at the first iteration itself - as when it will see that first three alphabets are 'a' which matches the needle ?

回答1:

Suppose the string we are searching for is not "aaa" but rather some other string whose hash is the same as the hash of "aaa". Then the comparison will be needed at every point.

Of course, we would expect the comparison to fail earlier than m characters, but it could require o(m) characters.

Having said that, a common use of RK is to find all (overlapping) instances, in which case the example cited would clearly be o(mn).

回答2:

Rabin-Karp algorithm keeps computing hash values of all the substring of text of size M and matches it with that of the hash value of the pattern. Now, there can be multiple substrings having a same hash value.

So when the hash values of the pattern and some substring of the text match, we need to iterate over them again just to make sure if they are actually same.

In case of pattern = "AAA" and text = "AAAAAAAAAAAAA", there are O(n) substrings matching the hash value of the pattern. And for every match, we need to iterate over to confirm in O(m) time; hence the worst-case complexity O(n*m).

来源：https://stackoverflow.com/questions/39054972/rabin-karp-algorithm-how-is-the-worst-case-omn-for-the-given-input

标签

algorithm

rabin-karp