Word boundary does not work inside brackets in regex [duplicate]

落花浮王杯 提交于 2019-12-24 13:51:20

问题


I have noticed that the word boundary \bword\b does not work inside brackets when doing a preg_replace() in PHP.

Specifically, I'm trying to exclude the full word > (which stands for > in HTML), but since the word boundary does not trigger inside brackets as in [^\b>\b], any of those characters by itself, like g or &, will be detected as a non-match. If you try to do a match outside the brackets, \b works as expected in PHP even though the word starts with a & a non-character.

Any thoughts/ideas to get around this situation?


回答1:


To exclude in PHP, (*SKIP)(*F) is your friend

In PHP, excluding anything is frighteningly simple thanks to the powerful (*SKIP)(*F) syntax (also available in Perl).

To exclude > and watch something else, you can just do this:

>(*SKIP)(*F)|something_else

The left side of the alternation | matches complete >then deliberately fails, after which the engine skips to the next position in the string. The right side matches something_else, and we know that it is not > because it was not matched by the expression on the left. Just make sure that something_else is not something generic such as .* as that could roll over all the following > instances. For instance, here, \w+ would be a perfectly fine pattern for something_else, as it does not clash with >

Further reading about this and other techniques to exclude patterns in regex

How to match (or replace) a pattern except in situations s1, s2, s3...




回答2:


One solution to my own question is: instead of doing a [^word] condition, check if the word/sentence I want is not immediately followed by the word I don't want. As in:

>(?!>)

For my particular case, it worked.



来源:https://stackoverflow.com/questions/24257055/word-boundary-does-not-work-inside-brackets-in-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!