Java Regular Expression Two Question marks (??)

后端 未结 1 1784
野的像风
野的像风 2021-01-17 22:48

I know that /? means the / is optional. so \"toys?\" will match both toy and toys. My understanding is that if I make it lazy and use \"toys??\" I will match both toy and to

相关标签:
1条回答
  • 2021-01-17 23:23

    ?? is lazy while ? is greedy.

    Given (pattern)??, it will first test for empty string, then if the rest of the pattern can't match, it will test for pattern.

    In contrast, (pattern)? will test for pattern first, then it will test for empty string on backtrack.


    Now, change the regular expression to "toys??2" and it still matches toys2 and toy2. In both cases, it returns the entire string without the s removed. Is there any functional difference between searching for "toys?2" and "toys??2".

    The difference is in the order of searching:

    • "toys?2" searches for toys2, then toy2
    • "toys??2" searches for toy2, then toys2

    But for the case of these 2 patterns, the result will be the same regardless of the input string, since the sequel 2 (after s? or s??) must be matched.


    As for the pattern you found:

    Pattern.compile("</??tag(\\s+?.*?)??>", Pattern.CASE_INSENSITIVE)
    

    Both ?? can be changed to ? without affecting the result:

    • / and t (in tag) are mutually exclusive. You either match one or the other.
    • > and \s are also mutually exclusive. The at least 1 in \s+? is important to this conclusion: the result might be different otherwise.

    This is probably micro-optimization from the author. He probably thinks that the open tag must be there, while the closing tag might be forgotten, and that open/close tags without attributes/random spaces appears more often than those with some.

    By the way, the engine might run into some expensive backtracking attempt due to \\s+?.*? when the input has <tag followed by lots of spaces without > anywhere near.

    0 讨论(0)
提交回复
热议问题