Regex: match pattern as long as it's not in the beginning

▼魔方 西西 提交于 2019-11-27 01:47:25

问题


Assume the following strings:

aaa bbb ccc
bbb aaa ccc

I want to match aaa as long as it is not at the start of the string. I'm trying to negate it by doing something like this:

[^^]aaa

But I don't think this is right. Using preg_replace.


回答1:


You can use a look behind to make sure it is not at the beginning. (?<!^)aaa




回答2:


Since I came here via Google search, and was interested in a solution that is not using a lookbehind, here are my 2 cents.

The [^^]aaa pattern matches a character other than ^ and then 3 as anywhere inside a string. The [^...] is a negated character class where ^ is not considered a special character. Note the first ^ that is right after [ is special as it denotes a negation, and the second one is just a literal caret symbol.

Thus, a ^ cannot be inside [...] to denote the start of string.

A solution is to use any negative lookaround, these two will work equally well:

(?<!^)aaa

and a lookahead:

(?!^)aaa

Why lookahead works, too? Lookarounds are zero-width assertions, and anchors are zero-width, too - they consume no text. Literally speaking, (?<!^) checks if there is no start of string position immediately to the left of the current location, and (?!^) checks if there is no start of string position immediately to the right of the current location. The same locations are being checked, that is why both work well.




回答3:


If you don't want to use lookbehind then use this regex:

/.(aaa)/

And use matched group # 1.




回答4:


This situation is the first time that I've seen lookarounds outperform \K. Interesting.

Typically capture groups and lookarounds cost additional steps. But due to the nature of this task, the regex engine can navigate the string faster in search of the aaa then look back for a start of the string anchor.

I'll add a couple of \K patterns for comparison.

I am using the s pattern modifier in case the leading character might be a newline character (which . would not normally match). I just thought I would add this consideration to preemptively address a fringe case that I may be posed.

Again, this is an enlightening scenario because in all other regex cases that I've dealt with \K beats out the other techniques.

Step Count Comparison Matrix:

              | `~.\Kaaa~s` | `~.+?\Kaaa~s` | `(?<!^)aaa` | `(?!^)aaa` | `.(aaa)` |
--------------|-------------|---------------|-------------|------------|----------|
`aaa bbb ccc` |   12 steps  |    67 steps   |   8 steps   |  8 steps   | 16 steps |
--------------|-------------|---------------|-------------|------------|----------|
`bbb aaa ccc` |   15 steps  |    12 steps   |   6 steps   |  6 steps   | 12 steps |

The take away is: To learn about the efficiency of your patterns, spit them into regex101.com and compare the step counts.

Also, if you know exactly what substring your are looking for and you don't need a regex pattern, then you should be using strpos() as a matter of best practice (and just check that the returned value is > 0).




回答5:


This will work to find what you are looking for:

(?<!^)aaa

Example in use: http://regexr.com?34ab2




回答6:


I came here looking at a solution for the re2 engine, used by google spreadsheets, which doesn't support lookarounds. But the answers here gave me the idea of using the following. I don't understand why i have to replace by the captured group but anyhow, it works.

aaa bbb ccc
bbb aaa ccc

([^^])aaa

replace by:

$1zzz

reuslts in:

aaa bbb ccc
bbb zzz ccc



来源:https://stackoverflow.com/questions/15669557/regex-match-pattern-as-long-as-its-not-in-the-beginning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!