Find the smallest period of input string in O(n)?

烈酒焚心 提交于 2019-11-30 07:27:55

Alright so this problem can definitely be solved in O(n), we just have to cleverly use KMP as you suggested.

Solving the longest proper prefix which is also a suffix problem is a vital part of KMP that we will make use of.

The longest proper prefix which is also a suffix problem is a mouthful so let's just call it the prefix suffix problem for now.

The prefix suffix problem can be pretty hard to understand so I'll include some examples.

The prefix suffix solution for "abcabc" is "abc" since that is the longest string which is both a proper prefix and a proper suffix (proper prefixes and suffixes cannot be the entire string).

The prefix suffix solution for "abcabca" is "a"

Hmmmmmmmmm wait a minute if we just chop off "a" from the end of "abcabca" we are left with "abcabc" and if we get the solution("abc") for this new string and chop it off again we are left with "abc" Hmmmmmmmmm. Very interesting.(This is pretty much the solution but I will talk about why this works)

Alright let's try to formalize this intuition a bit more and see if we can arrive at a solution.

I will use one key assumption in my argument:

The smallest period of our pattern is a valid period of every larger period in our pattern

Let us store the prefix suffix solution for the first i characters of our pattern in lps[i]. This lps array can be calculated in O(n) and it is used in the KMP algorithm, you can read more about how to calculate it in O(n) here: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/

Just so we are clear I will list some examples of some lps arrays

Pattern:"aaaaa"

lps: [0, 1, 2, 3, 4]

Pattern:"aabbcc"

lps: [0, 1, 0, 0, 0, 0]

Pattern:"abcabcabc"

lps: [0, 0, 0, 1, 2, 3, 4, 5, 6]

Alright now lets define some variables, to help us find out why this lps array is useful.

Let l be the length of our pattern, and let k be the last value in our lps array(k=lps[l-1])

The value k tells us that the first k characters of our string are the same as the last k characters of our string. And we can use this fact to find a period!

Using this information we can now show that the prefix consisting of the first l-k characters of our string form a valid period. This is clear because the next k characters which are not in our prefix must match the first k characters of our prefix, because of how we defined our lps array. The first k characters that from our prefix must be the same as the last k characters which form our suffix.

In practice you can implement this with a simple while loop as shown below where index marks the end of the suffix you are currently considering to be the smallest period.

public static void main(String[] args){
    String pattern="abcabcabcabca";
    int[] lps= calculateLPS(pattern);
    //start at the end of the string
    int index=lps.length-1;
    while(lps[index]!=0){
        //shift back
        index-=lps[index];
    }
    System.out.println(pattern.substring(0,index+1));
}

And since calculating lps happens in O(n), and you are always moving at least 1 step back in the while loop the time complexity for the whole procedure is simply O(n)

I borrowed heavily from the geeksForGeeks implementation of KMP in my calculateLPS() method if you would like to see my exact code it is below, but I reccomend that you also look at their explanation: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/

static int[] calculateLPS(String pat) {
    int[] lps = new int[pat.length()];
    int len = 0;
    int i = 1;
    lps[0] = 0;

    while (i < pat.length()) {
        if (pat.charAt(i) == pat.charAt(len)) {
            len++;
            lps[i] = len;
            i++;
        }
        else {
            if (len != 0) {
                len = lps[len - 1];
            }
            else {
                lps[i] = len;
                i++;
            }
        }
    }
    System.out.println(Arrays.toString(lps));
    return lps;
}

Last but not least, thanks for posting such an interesting problem it was pretty fun to figure out! Also I am new to this so please let me know if any part of my explanation doesn't make sense.

I'm not sure that I understand your attempted solution. KMP is a useful subroutine, though -- the smallest period is how far KMP moves the needle string (i.e., S) after a complete match.

Ahmed Algaml

this problem can be solved using the Z function , this tutorial can help you .

See if this solution works for O(n). I used rotation of strings.

public static int stringPeriod(String s){

    String s1= s;
    String s2= s1;

    for (int i=1; i <s1.length();i++){
        s2=rotate(s2);
        if(s1.equals(s2)){
            return i;
        }
    }

    return -1;
}

public static String rotate(String s1){

    String  rotS= s1;

    rotS = s1.substring(1)+s1.substring(0,1);

    return rotS;

}

The complete program is available in this github repository

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!