Parsing a random string looking for repeating sequences using Java and Regex.
Consider strings:
aaabbaaacccbb
I\'d like to find a regular expression
This seems to work, though it gives subsequences as well:
(To be fair, this was built off of Guillame's code)
public static void main(final String[] args) {
// final String s = "RonRonJoeJoe";
// final String s = "RonBobRonJoe";
final String s = "aaabbaaacccbb";
final Pattern p = Pattern.compile("(.+).*\\1");
final Matcher m = p.matcher(s);
int start = 0;
while (m.find(start)) {
System.out.println(m.group(1));
start = m.toMatchResult().end(1);
}
}
You could disregard overlap.
// overlapped 1 or more chars
(?=(\w{1,}).*\1)
// overlapped 2 or more chars
(?=(\w{2,}).*\1)
// overlapped 3 or more chars, etc ..
(?=(\w{3,}).*\1)
Or, you could consume (non-overlapped) ..
// 1 or more chars
(?=(\w{1,}).*\1) \1
// 2 or more chars
(?=(\w{2,}).*\1) \1
// 3 or more chars, etc ..
(?=(\w{3,}).*\1) \1
The below should work for all requirements. It is actually a combination of a couple of the answers here, and it will print out all of the substrings that are repeated anywhere else in the string.
I set it to only return substrings of at least 2 characters, but it can be easily changed to single characters by changing "{2,}" in the regex to "+".
public static void main(String[] args)
{
String s = "RonSamJoeJoeSamRon";
Matcher m = Pattern.compile("(\\S{2,})(?=.*?\\1)").matcher(s);
while (m.find())
{
for (int i = 1; i <= m.groupCount(); i++)
{
System.out.println(m.group(i));
}
}
}
Output:
Ron
Sam
Joe
This does it:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "aaabbaaacccbb";
find(s);
String s1 = "RonRonRonJoeJoe .... ,,,,";
find(s1);
System.err.println("---");
String s2 = "RonBobRonJoe";
find(s2);
}
private static void find(String s) {
Matcher m = Pattern.compile("(.+)\\1+").matcher(s);
while (m.find()) {
System.err.println(m.group());
}
}
}
OUTPUT:
aaa
bb
aaa
ccc
bb
RonRonRon
JoeJoe
....
,,,,
---
You can use this positive lookahead
based regex:
((\\w)\\2+)(?=.*\\1)
String elem = "aaabbaaacccbb";
String regex = "((\\w)\\2+)(?=.*\\1)";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(elem);
for (int i=1; matcher.find(); i++)
System.out.println("Group # " + i + " got: " + matcher.group(1));
Group # 1 got: aaa
Group # 2 got: bb