If I\'m trying to match a quote-delimited string with a regex, which of the following is \"better\" (where \"better\" means both more efficient and less likely to do somethi
More complicated, but it handles escaped quotes and also escaped backslashes (escaped backslashes followed by a quote is not a problem)
/(["'])((\\{2})*|(.*?[^\\](\\{2})*))\1/
Examples:
"hello\"world" matches "hello\"world"
"hello\\"world" matches "hello\\"
Considering that I didn't even know about the "*?" thing until today, and I've been using regular expressions for 20+ years, I'd vote in favour of the first. It certainly makes it clear what you're trying to do - you're trying to match a string that doesn't include quotes.
From a performance perspective (extremely heavy, long-running loop over long strings), I could imagine that
"[^"]*"
is faster than
".*?"
because the latter would do an additional check for each step: peeking at the next character. The former would be able to mindlessly roll over the string.
As I said, in real-world scenarios this would hardly be noticeable. Therefore I would go with number two (if my current regex flavor supports it, that is) because it is much more readable. Otherwise with number one, of course.
I'd say the second one is better, because it fails faster when the terminating "
is missing. The first one will backtrack over the string, a potentially expensive operation. An alternative regexp if you are using perl 5.10 would be /"[^"]++"/
. It conveys the same meaning as version 1 does, but is as fast as version two.
I would suggest:
([\"'])(?:\\\1|.)*?\1
But only because it handles escaped quote chars and allows both the ' and " to be the quote char. I would also suggest looking at this article that goes into this problem in depth:
http://blog.stevenlevithan.com/archives/match-quoted-string
However, unless you have a serious performance issue or cannot be sure of embedded quotes, go with the simpler and more readable:
/".*?"/
I must admit that non-greedy patterns are not the basic Unix-style 'ed' regular expression, but they are getting pretty common. I still am not used to group operators like (?:stuff).
I'd go for number two since it's much easier to read. But I'd still like to match empty strings so I would use:
/".*?"/