Given the following problem :
Definition :
Let S be a string over alphabet Σ .
S\'
is the smallest period ofS
Alright so this problem can definitely be solved in O(n), we just have to cleverly use KMP as you suggested.
Solving the longest proper prefix which is also a suffix problem is a vital part of KMP that we will make use of.
The longest proper prefix which is also a suffix problem is a mouthful so let's just call it the prefix suffix problem for now.
The prefix suffix problem can be pretty hard to understand so I'll include some examples.
The prefix suffix solution for "abcabc" is "abc" since that is the longest string which is both a proper prefix and a proper suffix (proper prefixes and suffixes cannot be the entire string).
The prefix suffix solution for "abcabca" is "a"
Hmmmmmmmmm wait a minute if we just chop off "a" from the end of "abcabca" we are left with "abcabc" and if we get the solution("abc") for this new string and chop it off again we are left with "abc" Hmmmmmmmmm. Very interesting.(This is pretty much the solution but I will talk about why this works)
Alright let's try to formalize this intuition a bit more and see if we can arrive at a solution.
I will use one key assumption in my argument:
The smallest period of our pattern is a valid period of every larger period in our pattern
Let us store the prefix suffix solution for the first i
characters of our pattern in lps[i]
. This lps
array can be calculated in O(n)
and it is used in the KMP algorithm, you can read more about how to calculate it in O(n)
here: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
Just so we are clear I will list some examples of some lps
arrays
Pattern:"aaaaa"
lps: [0, 1, 2, 3, 4]
Pattern:"aabbcc"
lps: [0, 1, 0, 0, 0, 0]
Pattern:"abcabcabc"
lps: [0, 0, 0, 1, 2, 3, 4, 5, 6]
Alright now lets define some variables, to help us find out why this lps
array is useful.
Let
l
be the length of our pattern, and letk
be the last value in our lps array(k=lps[l-1]
)
The value k
tells us that the first k
characters of our string are the same as the last k
characters of our string. And we can use this fact to find a period!
Using this information we can now show that the prefix consisting of the first l-k
characters of our string form a valid period. This is clear because the next k
characters which are not in our prefix must match the first k
characters of our prefix, because of how we defined our lps
array. The first k
characters that from our prefix must be the same as the last k
characters which form our suffix.
In practice you can implement this with a simple while loop as shown below where index
marks the end of the suffix you are currently considering to be the smallest period.
public static void main(String[] args){
String pattern="abcabcabcabca";
int[] lps= calculateLPS(pattern);
//start at the end of the string
int index=lps.length-1;
while(lps[index]!=0){
//shift back
index-=lps[index];
}
System.out.println(pattern.substring(0,index+1));
}
And since calculating lps
happens in O(n)
, and you are always moving at least 1 step back in the while loop the time complexity for the whole procedure is simply O(n)
I borrowed heavily from the geeksForGeeks implementation of KMP in my calculateLPS() method if you would like to see my exact code it is below, but I reccomend that you also look at their explanation: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
static int[] calculateLPS(String pat) {
int[] lps = new int[pat.length()];
int len = 0;
int i = 1;
lps[0] = 0;
while (i < pat.length()) {
if (pat.charAt(i) == pat.charAt(len)) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = len;
i++;
}
}
}
System.out.println(Arrays.toString(lps));
return lps;
}
Last but not least, thanks for posting such an interesting problem it was pretty fun to figure out! Also I am new to this so please let me know if any part of my explanation doesn't make sense.
See if this solution works for O(n). I used rotation of strings.
public static int stringPeriod(String s){
String s1= s;
String s2= s1;
for (int i=1; i <s1.length();i++){
s2=rotate(s2);
if(s1.equals(s2)){
return i;
}
}
return -1;
}
public static String rotate(String s1){
String rotS= s1;
rotS = s1.substring(1)+s1.substring(0,1);
return rotS;
}
The complete program is available in this github repository
this problem can be solved using the Z function , this tutorial can help you .
I'm not sure that I understand your attempted solution. KMP is a useful subroutine, though -- the smallest period is how far KMP moves the needle string (i.e., S
) after a complete match.
This problem can easily be solved by KMP
Example:
Original string = abaaba
n = 6
New string = abaabaabaaba
KMP values for this new string: 0 0 1 1 2 3 4 5 6 7 8 9
The first value >= n is 6 which is at position 8. 8 - 6 + 1 = 3 is the length of the shortest period of the string (aba).