问题
I am trying to replace all punctuation and "not words" except for "." and "-" in a string, but am struggling to find the right combination to set up the regex expression.
I've been using the following str_replace_all() code in R, but now I want to specify to ignore "." and "-". I've tried setting it up to include things like [^.-] and ([.-]), but I'm not getting the desired output.
str_replace_all("[APPLE/O.ORANGE*PLUM-11]", regex("[\\W+,[:punct:]]", perl=T)," ")
" APPLE O ORANGE PLUM 11 " #current output
" APPLE O.ORANGE PLUM-11 " #desired output
Any suggestions would be greatly appreciated. Thanks!
回答1:
It's probably easier to use the ^, which means that it is matching everything not referenced within the brackets. By including all letters, numbers, ., and - in the box you don't replace those.
library(stringr)
str_replace_all("[APPLE/O.ORANGE*PLUM-11]", "[^a-zA-Z0-9.-]"," ")
回答2:
Note that str_replace_all
does not allow using PCRE patterns, the stringr library is ICU regex powered.
What you need to do can be done with a base R gsub
using the following pattern:
> x<-"[APPLE/O.ORANGE*PLUM-11]"
> gsub("[^\\w.-]", " ", x, perl=TRUE)
[1] " APPLE O.ORANGE PLUM-11 "
See the R demo online. Also, see the regex online demo here.
The [^\\w.-]
pattern matches any character other than (since [^...]
is a negated character class) word char (letter, digit, _
), .
and -
.
来源:https://stackoverflow.com/questions/41984513/r-str-replace-all-except-periods-and-dashes