How do I use Java Regex to find all repeating character sequences in a string?

后端 未结 5 1096
借酒劲吻你
借酒劲吻你 2021-01-11 20:38

Parsing a random string looking for repeating sequences using Java and Regex.

Consider strings:

aaabbaaacccbb

I\'d like to find a regular expression

相关标签:
5条回答
  • 2021-01-11 20:57

    This seems to work, though it gives subsequences as well:

    (To be fair, this was built off of Guillame's code)

    public static void main(final String[] args) {
        // final String s = "RonRonJoeJoe";
        // final String s = "RonBobRonJoe";
        final String s = "aaabbaaacccbb";
    
        final Pattern p = Pattern.compile("(.+).*\\1");
    
        final Matcher m = p.matcher(s);
        int start = 0;
        while (m.find(start)) {
            System.out.println(m.group(1));
            start = m.toMatchResult().end(1);
        }
    }
    
    0 讨论(0)
  • 2021-01-11 21:01

    You could disregard overlap.

    // overlapped 1 or more chars
    (?=(\w{1,}).*\1)
    // overlapped 2 or more chars
    (?=(\w{2,}).*\1)
    // overlapped 3 or more chars, etc ..
    (?=(\w{3,}).*\1)
    

    Or, you could consume (non-overlapped) ..

    // 1 or more chars
    (?=(\w{1,}).*\1) \1
    // 2 or more chars
    (?=(\w{2,}).*\1) \1
    // 3 or more chars, etc ..
    (?=(\w{3,}).*\1) \1
    
    0 讨论(0)
  • 2021-01-11 21:03

    The below should work for all requirements. It is actually a combination of a couple of the answers here, and it will print out all of the substrings that are repeated anywhere else in the string.

    I set it to only return substrings of at least 2 characters, but it can be easily changed to single characters by changing "{2,}" in the regex to "+".

    public static void main(String[] args)
    {
      String s = "RonSamJoeJoeSamRon";
      Matcher m = Pattern.compile("(\\S{2,})(?=.*?\\1)").matcher(s);
      while (m.find())
      {
        for (int i = 1; i <= m.groupCount(); i++)
        {
          System.out.println(m.group(i));
        }
      }
    }
    

    Output:
    Ron
    Sam
    Joe

    0 讨论(0)
  • 2021-01-11 21:07

    This does it:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Test {
        public static void main(String[] args) {
            String s = "aaabbaaacccbb";
            find(s);
            String s1 = "RonRonRonJoeJoe .... ,,,,";
            find(s1);
            System.err.println("---");
            String s2 = "RonBobRonJoe";
            find(s2);
        }
    
        private static void find(String s) {
            Matcher m = Pattern.compile("(.+)\\1+").matcher(s);
            while (m.find()) {
                System.err.println(m.group());
            }
        }
    }
    

    OUTPUT:

    aaa
    bb
    aaa
    ccc
    bb
    RonRonRon
    JoeJoe
    ....
    ,,,,
    ---
    
    0 讨论(0)
  • You can use this positive lookahead based regex:

    ((\\w)\\2+)(?=.*\\1)
    

    Code:

    String elem = "aaabbaaacccbb";
    String regex = "((\\w)\\2+)(?=.*\\1)";
    Pattern p = Pattern.compile(regex);
    Matcher matcher = p.matcher(elem);
    for (int i=1; matcher.find(); i++)
    System.out.println("Group # " + i + " got: " + matcher.group(1));
    

    OUTPUT:

    Group # 1 got: aaa
    Group # 2 got: bb
    
    0 讨论(0)
提交回复
热议问题