问题
i wonder what is the problem with the backreference here:
preg_match_all('/__\((\'|")([^\1]+)\1/', "__('match this') . 'not this'", $matches);
it is expected to match the string between __('') but actually it returns:
match this') . 'not this
any ideas?
回答1:
Make your regex ungreedy:
preg_match_all('/__((\'|")([^\1]+)\1/U', "__('match this') . 'not this'", $matches)
回答2:
You can't use a backreference inside a character class because a character class matches exactly one character, and a backreference can potentially match any number of characters, or none.
What you're trying to do requires a negative lookahead, not a negated character class:
preg_match_all('/__\(([\'"])(?:(?!\1).)+\1\)/',
"__('match this') . 'not this'", $matches);
I also changed your alternation - \'|"
- to a character class - [\'"]
- because it's much more efficient, and I escaped the outer parentheses to make them match literal parentheses.
EDIT: I guess I need to expand that "more efficient" remark. I took the example Friedl used to demonstrate this point and tested it in RegexBuddy.
Applied to target text abababdedfg
,^[a-g]+$
reports success after three steps, while^(?:a|b|c|d|e|f|g)+$
takes 55 steps.
And that's for a successful match. When I try it on abababdedfz
,^[a-g]+$
reports failure after 21 steps;^(?:a|b|c|d|e|f|g)+$
takes 99 steps.
In this particular case the impact on performance is so trivial it's not even worth mentioning. I'm just saying whenever you find yourself choosing between a character class and an alternation that both match the same things, you should almost always go with the character class. Just a rule of thumb.
回答3:
I'm suprised it didn't give you an unbalance parenthesis error message.
/
__
(
(\'|")
([^\1]+)
\1
/
This [^\1]
will not take the contents of capture buffer 1 and put it into a character
class. It is the same as all characters that are NOT '1'.
Try this:
/__\(('|").*?\1\).*/
You can add an inner capturing parenthesis to just capture whats between quotes:/__\(('|")(.*?)\1\).*/
Edit: If no inner delimeter is allowed, use Qtax regex.
Since, ('|").*?\1
even though non-greedy, will still match all up to the trailing anchor. In this case __('all'this'will"match')
, and its better to use ('[^']*'|"[^"]*)
as
回答4:
You can use something like:
/__\(("[^"]+"|'[^']+')\)/
来源:https://stackoverflow.com/questions/6050427/regex-problem-with-backreference-in-pattern-with-preg-match-all