My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.
Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:
private static String findSequence(String text) {
Pattern pattern = Pattern.compile("(.+?)\\1+");
Matcher matcher = pattern.matcher(text);
return matcher.matches() ? matcher.group(1) : null;
}
it tries to find the shortest string (.+?
) that must be repeated at least once (\1+
) to match the entire input text.
Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.
given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,
l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
for (j=i+t;j<n-t; j++) {
l=0;
while (i+l<j && j+l<n && b[i+l] == b[j+l])
l++;
if (l>t) {
print "Sequence of length " + l + " found at " + i + " and " + j);
if (l>l_max) {
l_max = l;
i_max = i;
j_max = j;
}
}
}
}
if (l_max>t) {
print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
}
Basically: