问题
Given this vector:
ba <- c('baa','aba','abba','abbba','aaba','aabba')'
I want to change the final a
of each word to i
except baa
and aba
.
I wrote the following line ...
gsub('(?<=a[ab]b{1,2})a','i',ba,perl=T)
but was told: PCRE pattern compilation error 'lookbehind assertion is not fixed length' at ')a'.
I looked around a little bit and apparently R/Perl can only lookahead for a variable width, not lookbehind. Any workaround to this problem? Thanks!
回答1:
You can use the lookbehind alternative \K
instead. This escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included.
Quoted — rexegg
The key difference between \K and a lookbehind is that in PCRE, a lookbehind does not allow you to use quantifiers: the length of what you look for must be fixed. On the other hand, \K can be dropped anywhere in a pattern, so you are free to have any quantifiers you like before \K.
Using it in context:
sub('a[ab]b{1,2}\\Ka', 'i', ba, perl=T)
# [1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
Avoiding lookarounds:
sub('(a[ab]b{1,2})a', '\\1i', ba)
# [1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
回答2:
Another solution for the current case only, when the only quantifier used is a limiting quantifier, may be using stringr::str_replace_all
/ stringr::str_replace
:
> library(stringr)
> str_replace_all(ba, '(?<=a[ab]b{1,2})a', 'i')
[1] "baa" "aba" "abbi" "abbbi" "aabi" "aabbi"
It works because stringr
regex functions are based on ICU regex that features a constrained-width lookbehind:
The length of possible strings matched by the look-behind pattern must not be unbounded (no
*
or+
operators.)
So, you can't really use any kind of patterns inside ICU lookbehinds, but it is good to know you may use at least a limiting quantifier in it when you need to get overlapping texts within a known distance range.
来源:https://stackoverflow.com/questions/29308348/r-workaround-for-variable-width-lookbehind