问题
I have noticed that the word boundary \bword\b
does not work inside brackets when doing a preg_replace()
in PHP.
Specifically, I'm trying to exclude the full word >
(which stands for >
in HTML), but since the word boundary does not trigger inside brackets as in [^\b>\b]
, any of those characters by itself, like g
or &
, will be detected as a non-match. If you try to do a match outside the brackets, \b
works as expected in PHP even though the word starts with a &
a non-character.
Any thoughts/ideas to get around this situation?
回答1:
To exclude in PHP, (*SKIP)(*F) is your friend
In PHP, excluding anything is frighteningly simple thanks to the powerful (*SKIP)(*F)
syntax (also available in Perl).
To exclude >
and watch something else, you can just do this:
>(*SKIP)(*F)|something_else
The left side of the alternation |
matches complete >
then deliberately fails, after which the engine skips to the next position in the string. The right side matches something_else
, and we know that it is not >
because it was not matched by the expression on the left. Just make sure that something_else
is not something generic such as .*
as that could roll over all the following >
instances. For instance, here, \w+
would be a perfectly fine pattern for something_else
, as it does not clash with >
Further reading about this and other techniques to exclude patterns in regex
How to match (or replace) a pattern except in situations s1, s2, s3...
回答2:
One solution to my own question is: instead of doing a [^word]
condition, check if the word/sentence I want is not immediately followed by the word I don't want. As in:
>(?!>)
For my particular case, it worked.
来源:https://stackoverflow.com/questions/24257055/word-boundary-does-not-work-inside-brackets-in-regex