R-regex: match strings not beginning with a pattern

守給你的承諾、 提交于 2019-11-30 04:56:43

Yeah. Put the zero width lookahead /outside/ the other parens. That should give you this:

> grepl("^(?!hede).*$", "hede", perl = TRUE)
[1] FALSE
> grepl("^(?!hede).*$", "foohede", perl = TRUE)
[1] TRUE

which I think is what you want.

Alternately if you want to capture the entire string, ^(?!hede)(.*)$ and ^((?!hede).*)$ are both equivalent and acceptable.

I got stuck on the following special case, so I thought I would share...

What if there are multiple instances of the regular expression, but you still only want the first segment?

Apparently you can turn off the implicit greediness of the search with specific perl wildcard modifiers

Suppose the string I wanted to process was

myExampleString = paste0(c(letters[1:13], "_", letters[14:26], "__",
                           LETTERS[1:13], "_", LETTERS[14:26], "__",
                           "laksjdl", "_", "lakdjlfalsjdf"),
                         collapse = "")
myExampleString

"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ__laksjdl_lakdjlfalsjd"

and that I wanted only the first segment before the first "__". I cannot simply search on "_", because single-underscore is an allowable non-delimiter in this example string.

The following doesn't work. It instead gives me the first and second segments because of the default greediness (but not third, because of the forward-look).

gsub("^(.+(?=__)).*$", "\\1", myExampleString, perl = TRUE)

"abcdefghijklm_nopqrstuvwxyz__ABCDEFGHIJKLM_NOPQRSTUVWXYZ"

But this does work

gsub("^(.+?(?=__)).*$", "\\1", myExampleString, perl = TRUE)

"abcdefghijklm_nopqrstuvwxyz"

The difference is the greedy-modifier "?" after the wildcard ".+" in the (perl) regular expression.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!