R str_replace_all except periods and dashes

荒凉一梦 提交于 2020-01-15 10:05:03

问题


I am trying to replace all punctuation and "not words" except for "." and "-" in a string, but am struggling to find the right combination to set up the regex expression.

I've been using the following str_replace_all() code in R, but now I want to specify to ignore "." and "-". I've tried setting it up to include things like [^.-] and ([.-]), but I'm not getting the desired output.

str_replace_all("[APPLE/O.ORANGE*PLUM-11]", regex("[\\W+,[:punct:]]", perl=T)," ")

" APPLE O ORANGE PLUM 11 " #current output

" APPLE O.ORANGE PLUM-11 " #desired output

Any suggestions would be greatly appreciated. Thanks!


回答1:


It's probably easier to use the ^, which means that it is matching everything not referenced within the brackets. By including all letters, numbers, ., and - in the box you don't replace those.

library(stringr) 
str_replace_all("[APPLE/O.ORANGE*PLUM-11]", "[^a-zA-Z0-9.-]"," ")



回答2:


Note that str_replace_all does not allow using PCRE patterns, the stringr library is ICU regex powered.

What you need to do can be done with a base R gsub using the following pattern:

> x<-"[APPLE/O.ORANGE*PLUM-11]"
> gsub("[^\\w.-]", " ", x, perl=TRUE)
[1] " APPLE O.ORANGE PLUM-11 "

See the R demo online. Also, see the regex online demo here.

The [^\\w.-] pattern matches any character other than (since [^...] is a negated character class) word char (letter, digit, _), . and -.



来源:https://stackoverflow.com/questions/41984513/r-str-replace-all-except-periods-and-dashes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!