Regex backreferences in Java

前端 未结 2 612
你的背包
你的背包 2021-01-03 04:26

I had to match a number followed by itself 14 times. Then I\'ve came to the following regular expression in the regexstor.net/tester:

(\\d)\\1{14}
         


        
2条回答
  •  借酒劲吻你
    2021-01-03 04:52

    $1 is not a back reference in Java's regexes, nor in any other flavor I can think of. You only use $1 when you are replacing something:

    String input="A12.3 bla bla my input";
    input = StringUtils.replacePattern(
                input, "^([A-Z]\\d{2}\\.\\d).*$", "$1");
    //                                            ^^^^
    

    There is some misinformation about what a back reference is, including the very place I got that snippet from: simple java regex with backreference does not work.


    Java modeled its regex syntax after other existing flavors where the $ was already a meta character. It anchors to the end of the string (or line in multi-line mode).

    Similarly, Java uses \1 for back references. Because regexes are strings, it must be escaped: \\1.

    From a lexical/syntactic standpoint it is true that $1 could be used unambiguously (as a bonus it would prevent the need for the "evil escaped escape" when using back references).

    To match a 1 that comes after the end of a line the regex would need to be $\n1:

    this line
    1
    

    It just makes more sense to use a familiar syntax instead of changing the rules, most of which came from Perl.

    The first version of Perl came out in 1987, which is much earlier than Java, which was released in beta in 1995.

    I dug up the man pages for Perl 1, which say:

    The bracketing construct (\ ...\ ) may also be used, in which case \ matches the digit'th substring. (Outside of the pattern, always use $ instead of \ in front of the digit. The scope of $ (and $\`, $& and $') extends to the end of the enclosing BLOCK or eval string, or to the next pattern match with subexpressions. The \ notation sometimes works outside the current pattern, but should not be relied upon.) You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10, $11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parens before the backreference. Otherwise (for backward compatibilty) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)

提交回复
热议问题