Regex backreferences in Java

前端 未结 2 614
你的背包
你的背包 2021-01-03 04:26

I had to match a number followed by itself 14 times. Then I\'ve came to the following regular expression in the regexstor.net/tester:

(\\d)\\1{14}


        
相关标签:
2条回答
  • 2021-01-03 04:52

    $1 is not a back reference in Java's regexes, nor in any other flavor I can think of. You only use $1 when you are replacing something:

    String input="A12.3 bla bla my input";
    input = StringUtils.replacePattern(
                input, "^([A-Z]\\d{2}\\.\\d).*$", "$1");
    //                                            ^^^^
    

    There is some misinformation about what a back reference is, including the very place I got that snippet from: simple java regex with backreference does not work.


    Java modeled its regex syntax after other existing flavors where the $ was already a meta character. It anchors to the end of the string (or line in multi-line mode).

    Similarly, Java uses \1 for back references. Because regexes are strings, it must be escaped: \\1.

    From a lexical/syntactic standpoint it is true that $1 could be used unambiguously (as a bonus it would prevent the need for the "evil escaped escape" when using back references).

    To match a 1 that comes after the end of a line the regex would need to be $\n1:

    this line
    1
    

    It just makes more sense to use a familiar syntax instead of changing the rules, most of which came from Perl.

    The first version of Perl came out in 1987, which is much earlier than Java, which was released in beta in 1995.

    I dug up the man pages for Perl 1, which say:

    The bracketing construct (\ ...\ ) may also be used, in which case \<digit> matches the digit'th substring. (Outside of the pattern, always use $ instead of \ in front of the digit. The scope of $<digit> (and $\`, $& and $') extends to the end of the enclosing BLOCK or eval string, or to the next pattern match with subexpressions. The \<digit> notation sometimes works outside the current pattern, but should not be relied upon.) You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10, $11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parens before the backreference. Otherwise (for backward compatibilty) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)

    0 讨论(0)
  • 2021-01-03 04:52

    I think the main Problem is not the backreference - which works perfectly fine with \1 in java.

    Your Problem is more likely the "overall" escaping of a regex pattern in Java.

    If you want to have the pattern

    (\d)\1{14}
    

    passed to the regex engine, you first need to escape it cause it's a java-string when you write it:

    (\\d)\\1{14}
    

    Voila, works like a charm: goo.gl/BNCx7B (add http://, SO does not allow Url-Shorteners, but tutorialspoint.com has no other option as it seems)

    Offline-Example:

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class HelloWorld{
    
         public static void main(String []args){
            String test = "555555555555555"; // 5 followed by 5 for 14 times.
    
            String pattern = "(\\d)\\1{14}";
    
            Pattern r = Pattern.compile(pattern);
            Matcher m = r.matcher(test);
            if (m.find( )) {
               System.out.println("Matched!");   
            }else{
               System.out.println("not matched :-(");    
            }
         }
    }
    
    0 讨论(0)
提交回复
热议问题