Java Regular Expression running very slow

前端 未结 1 973
失恋的感觉
失恋的感觉 2021-01-02 06:12

I\'m trying to use the Daring Fireball Regular Expression for matching URLs in Java, and I\'ve found a URL that causes the evaluation to take forever. I\'ve modified the or

相关标签:
1条回答
  • 2021-01-02 06:30

    The problem is here:

    "(?:" +                           // One or more:
    "[^\\s()<>]+" +                      // Run of non-space, non-()<>
    "|" +                               //   or
    "\\((?:[^\\s()<>]+|(?:\\([^\\s()<>]+\\)))*\\)" +  // balanced parens, up to 2 levels
    ")+"
    

    What you've got here is nested quantifiers. This plays havoc with any backtracking algorithm - as an example, consider the regex /^(a+)+$/ matching against the string

    aaaaaaaaaab
    

    As a first attempt, the inner quantifier will match all of the as. Then the regex fails, so it backs off one. Then the outer quantifier tries to match again, swallowing up the last a, then the regex fails once more. We basically get exponential behaviour as the quantifiers try all sorts of ways of splitting up the run of as, without actually making any progress.

    The solution is possessive quantifiers (which we denote by tacking a + onto the end of a quantifier) - we set up the inner quantifiers so that once they have a match, they don't let it go - they'll hold onto that until the match fails or an earlier quantifier backs off and they have to rematch starting somewhere else in the string. If we instead used /^(a++)+$/ as our regex, we would fail immediately on the non-matching string above, rather than going exponential trying to match it.

    Try making those inner quantifiers possessive and see if it helps.

    0 讨论(0)
提交回复
热议问题