Using mgsub function with word boundaries for replacement values

吃可爱长大的小学妹 提交于 2020-01-04 06:09:28

问题


I am trying to replace substrings of string elements within a vector with blank spaces. Below are the vectors we are considering:

test <- c("PALMA DE MALLORCA", "THE RICH AND THE POOR", "A CAMEL IN THE DESERT", "SANTANDER SL", "LA")

lista <- c("EL", "LA", "ES", "DE", "Y", "DEL", "LOS", "S.L.", "S.A.", "S.C.", "LAS",
       "DEL", "THE", "OF", "AND", "BY", "S", "L", "A", "C", "SA", "SC", "SL")

Then if we apply the mgsub function as it is, we get the following output:

library(qdap)
mgsub(lista, "", test)
# [1] "PM MOR"   "RIH POOR" "M IN ERT" "NTER"     ""  

So I change my list to the following and reexecute:

lista <- paste("\\b", lista, "\\b", sep = "")
mgsub(lista, "", test)
# [1] "PALMA DE MALLORCA"     "THE RICH AND THE POOR" "A CAMEL IN THE DESERT"
# [4] "SANTANDER SL"          "LA"   

I cannot get the word boundary regex to work for this function.


回答1:


According to multigsub {qdap} documentation:

mgsub(pattern, replacement = NULL, text.var, leadspace = FALSE, trailspace = FALSE, fixed = TRUE, trim = TRUE, ...)
...
fixed
logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments.

To make sure your vector of search terms is parsed as regular expressions, you need to "manually" set the fixed parameter to FALSE.

Another important note: the word boundary set after . requires a word character after it (or end of line). It is safer to use (?!\w) subpattern in this case. To use look-arounds in R regex, you need to use Perl-like regex. Thus, I suggest using this (if a non-word character can appear only at the end of the regex):

lista <- paste("\\b", lista, "(?!\\w)", sep = "")

or (if there can be a non-word character at the beginning, too):

lista <- paste("(?<!\\w)", lista, "(?!\\w)", sep = "")

and then

mgsub(lista, "", test, fixed=FALSE, perl=TRUE)


来源:https://stackoverflow.com/questions/33411524/using-mgsub-function-with-word-boundaries-for-replacement-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!