Regular expression doesn't match empty string in multiline mode (Java)

后端 未结 3 660
温柔的废话
温柔的废话 2021-01-04 01:31

I just observed this behavior;

Pattern p1 = Pattern.compile(\"^$\");
Matcher m1 = p1.matcher(\"\");
System.out.println(m1.matches()); /* true */

Pattern p2         


        
相关标签:
3条回答
  • 2021-01-04 01:55

    Let's look a bit closer at your second example:

    Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
    Matcher m2 = p2.matcher("");
    System.out.println(m2.matches()); /* false */
    

    So you have a line in m2, that is empty OR contains only character of endline and no other characters. Therefore you pattern, in order to correspond to the given line, should be only "$" i.e.:

    // Your example
    Pattern p2 = Pattern.compile("^$", Pattern.MULTILINE);
    Matcher m2 = p2.matcher("");
    System.out.println(m2.matches()); /* false */
    
    // Let's check if it is start of the line
    p2 = Pattern.compile("^", Pattern.MULTILINE);
    m2 = p2.matcher("");
    System.out.println(m2.matches()); /* false */
    
    // Let's check if it is end of the line
    p2 = Pattern.compile("$", Pattern.MULTILINE);
    m2 = p2.matcher("");
    System.out.println(m2.matches()); /* true */
    
    0 讨论(0)
  • 2021-01-04 02:03

    Sounds like a bug. At most, in multi-line mode, "^" and "$" could be interpreted as matching at an internal line boundary. Java might not have extended variable state structure say, like Perl does. I don't know if this is even a cause.

    The fact that /^test$/m matches just prove ^$ work in multi-line mode except when the string is empty (in Java), but clearly multi-line mode test for empty string is ludicrous since /^$/ work for that.

    Testing in Perl, everything works as expected:

    if ( "" =~ /^$/m   ) { print "/^\$/m    matches\n"; }
    if ( "" =~ /^$/    ) { print "/^\$/     matches\n"; }
    if ( "" =~ /\A\Z/m ) { print "/\\A\\Z/m  matches\n"; }
    if ( "" =~ /\A\Z/  ) { print "/\\A\\Z/   matches\n"; }
    if ( "" =~ /\A\z/  ) { print "/\\A\\z/   matches\n"; }
    if ( "" =~ /^/m    ) { print "/^/m     matches\n"; }
    if ( "" =~ /$/m    ) { print "/\$/m     matches\n"; }
    
    
    __END__
    
    
    /^$/m    matches
    /^$/     matches
    /\A\Z/m  matches
    /\A\Z/   matches
    /\A\z/   matches
    /^/m     matches
    /$/m     matches
    
    0 讨论(0)
  • 2021-01-04 02:15

    If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input.

    Since you are at the end of input, ^ can't match in multiline mode.

    This is surprising, even disgusting, but nevertheless according to its documentation.

    0 讨论(0)
提交回复
热议问题