I have an alpha-numeric string and I want to check for pattern repetition in it just for the integers. And they should be continuous.
Example
My theory is that you can use the data structure known as suffix tree to achieve what you want.
Going through the initial string, collect each contiguous sequence of digits and build its suffix tree. For your example it would look like (for the first 4 suffixes):
R - root
| | | |
| | | |
| | | |
12341234$ 2341234$ 341234$ 41234$
Now, the next suffix in order would be 1234$. However, when inserting, we notice that it matches the prefix 1234 of the first suffix. A counter is kept in parallel and incremented every time a suffix is added to the tree.
At each step we compare the counter with the length of the match between the current suffix to be inserted and the substring with which it matches. If the length of the match is a multiple of the counter, then we have a repetition.
In the above case, the counter would be 4 (starting from 0) by the time we insert 1234$ and the length of the match with the prefix of 12341234$ is also 4, so 1234 is repeated.
I am not sure if you are familiar with RegularExpressions (RegEx) but this code works
String str = "12341234qwe";
String rep = str.replaceAll(".*(.+)\\1.*","$1");
if (rep.equals(str))
System.out.println(str+" has no repition");
else
System.out.println(str+" has repition "+rep);
str = "1234qwe1234";
rep = str.replaceAll(".*(.+)\\1.*","$1");
if (rep.equals(str))
System.out.println(str+" has no repition");
else
System.out.println(str+" has repition "+rep);
Here is tutorial: http://docs.oracle.com/javase/tutorial/essential/regex/
Apache Commons Lang. has a class org.apache.commons.lang.StringUtils
which has a method that counts the occurrences of the specific substring. It already exist, so you can use it directly instead of creating your own solution.
//First parameter is the string to find and second param is the String to search.
StringUtils.CountMatches("1234","12341234");
First you'd want to define some rules for a pattern. If a pattern could have any arbitrary length, then you should start storing int values (building up the pattern) and starting to check for a repetition at the first repeated int.
In this case: 1234123q You're building the 1234 pattern, then since 1 is repeated you should keep storing it AND start comparing it with the next values.
How do you handle repetitions inside a pattern?
In the case: 123124123124
the pattern 123124 is repeated twice. Should it register as a repetition, or stop at the the first 4 since 123 != 124 ?
If you choose to register those case as valid repetition, you'll need to start creating parallel patterns to check at the sime times as you keep building them up.
The firs case (stopping at the first NOT repeated value) is simple, the second case will generate a lot of parralel patterns to build and to check at the same time.
Once you reach the end of the stream you could do the search using String-provided existing methods.
You can take help of regex to solve this I think. Consider code like this:
String arr[] = {"12341234abc", "1234foo1234", "12121212", "111111111", "1a1212b123123c12341234d1234512345"};
String regex = "(\\d+?)\\1";
Pattern p = Pattern.compile(regex);
for (String elem : arr) {
boolean noMatchFound = true;
Matcher matcher = p.matcher(elem);
while (matcher.find()) {
noMatchFound = false;
System.out.println(elem + " got repeated: " + matcher.group(1));
}
if (noMatchFound) {
System.out.println(elem + " has no repeation");
}
}
OUTPUT:
abc12341234abc got repeated: 1234
1234foo1234 has no repeation
12121212 got repeated: 12
12121212 got repeated: 12
111111111 got repeated: 1
111111111 got repeated: 1
111111111 got repeated: 1
111111111 got repeated: 1
1a1212b123123c12341234d1234512345 got repeated: 12
1a1212b123123c12341234d1234512345 got repeated: 123
1a1212b123123c12341234d1234512345 got repeated: 1234
1a1212b123123c12341234d1234512345 got repeated: 12345
Regex being used is (\\d+?)\\1
where
\\d - means a numerical digit
\\d+ - means 1 or more occurrences of a digit
\\d+? - means reluctant (non-greedy) match of 1 OR more digits
( and ) - to group the above regex into group # 1
\\1 - means back reference to group # 1
(\\d+?)\\1 - repeat the group # 1 immediately after group # 1