How can I match a quote-delimited string with a regex?

前端 未结 9 688
一生所求
一生所求 2020-12-01 05:05

If I\'m trying to match a quote-delimited string with a regex, which of the following is \"better\" (where \"better\" means both more efficient and less likely to do somethi

相关标签:
9条回答
  • 2020-12-01 05:42

    More complicated, but it handles escaped quotes and also escaped backslashes (escaped backslashes followed by a quote is not a problem)

    /(["'])((\\{2})*|(.*?[^\\](\\{2})*))\1/
    

    Examples:
      "hello\"world" matches "hello\"world"
      "hello\\"world" matches "hello\\"

    0 讨论(0)
  • 2020-12-01 05:43

    Considering that I didn't even know about the "*?" thing until today, and I've been using regular expressions for 20+ years, I'd vote in favour of the first. It certainly makes it clear what you're trying to do - you're trying to match a string that doesn't include quotes.

    0 讨论(0)
  • 2020-12-01 05:45

    From a performance perspective (extremely heavy, long-running loop over long strings), I could imagine that

    "[^"]*"
    

    is faster than

    ".*?"
    

    because the latter would do an additional check for each step: peeking at the next character. The former would be able to mindlessly roll over the string.

    As I said, in real-world scenarios this would hardly be noticeable. Therefore I would go with number two (if my current regex flavor supports it, that is) because it is much more readable. Otherwise with number one, of course.

    0 讨论(0)
  • 2020-12-01 05:52

    I'd say the second one is better, because it fails faster when the terminating " is missing. The first one will backtrack over the string, a potentially expensive operation. An alternative regexp if you are using perl 5.10 would be /"[^"]++"/. It conveys the same meaning as version 1 does, but is as fast as version two.

    0 讨论(0)
  • 2020-12-01 05:54

    I would suggest:

    ([\"'])(?:\\\1|.)*?\1
    

    But only because it handles escaped quote chars and allows both the ' and " to be the quote char. I would also suggest looking at this article that goes into this problem in depth:

    http://blog.stevenlevithan.com/archives/match-quoted-string

    However, unless you have a serious performance issue or cannot be sure of embedded quotes, go with the simpler and more readable:

    /".*?"/
    

    I must admit that non-greedy patterns are not the basic Unix-style 'ed' regular expression, but they are getting pretty common. I still am not used to group operators like (?:stuff).

    0 讨论(0)
  • 2020-12-01 05:59

    I'd go for number two since it's much easier to read. But I'd still like to match empty strings so I would use:

    /".*?"/
    
    0 讨论(0)
提交回复
热议问题