str_replace_all replacing named vector elements iteratively not all at once

若如初见. 提交于 2020-02-24 05:53:26

问题


Let's say I have a long character string: pneumonoultramicroscopicsilicovolcanoconiosis. I'd like to use stringr::str_replace_all to replace certain letters with others. According to the documentation, str_replace_all can take a named vector and replaces the name with the value. That works fine for 1 replacement, but for multiple it seems to do it iteratively, so the result is a replacement of the prelast iteration. I'm not sure this is the intended behaviour.

library(tidyverse)
text_string = "developer"
text_string %>% 
  str_replace_all(c(e ="X")) #this works fine
[1] "dXvXlopXr"
text_string %>% 
  str_replace_all(c(e ="p", p = "e")) #not intended behaviour
[1] "develoeer"

Desired result:

[1] "dpvploepr"

Which I get by introducing a new character:

text_string %>% 
  str_replace_all(c(e ="X", p = "e", X = "p"))

It's a usable workaround but hardly generalisable. Is this a bug or are my expectations wrong?

I'd like to also be able to replace n letters with n other letters simultaneously, preferably using either two vectors (like "old" and "new") or a named vector as input.

reprex edited for easier human reading


回答1:


I'm working on a package to deal with the type of problem. This is safer than the qdap::mgsub function because it does not rely on placeholders. It fully supports regex as the matching and the replacement. You provide a named list where the names are the strings to match on and their value is the replacement.

devtools::install_github("bmewing/mgsub")
library(mgsub)
mgsub("developer",list("e" ="p", "p" = "e"))
#> [1] "dpvploepr"

qdap::mgsub(c("e","p"),c("p","e"),"developer")
#> [1] "dpvploppr"



回答2:


My workaround would be to take advantage of the fact that str_replace_all can take functions as an input for the replacement.

library(stringr)
text_string = "developer"
pattern <- "p|e"
fun <- function(query) {
    if(query == "e") y <- "p"
    if(query == "p") y <- "e"
    return(y)
}

str_replace_all(text_string, pattern, fun)

Of course, if you need to scale up, I would suggest to use a more sophisticated function.




回答3:


There is probably an order in what the function does, so after replacing all c by s, you replace all s by c, only c remains .. try this :

long_string %>% str_replace_all(c(c ="X", s = "U"))  %>% str_replace_all(c(X ="s", U = "c"))



回答4:


The iterative behavior is intended. That said, we can use write our own workaround. I am going to use character subsetting for the replacement.

In a named vector, we can look up things by name and get a replacement value for each name. This is like doing all the replacement simultaneously.

rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "a", "b", "X", "X")
rules[chars]
#>   a   a   b   X   X 
#> "X" "X" "Y" "a" "a"

So here, looking up "a" in the rules vector gets us "X", effectively replacing "a" with "X". The same goes for the other characters.

One problem is that names without a match yield NA.

rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "Y", "Z")
rules[chars]
#>    a <NA> <NA> 
#>  "X"   NA   NA

To prevent the NAs from appearing, we can expand the rules to include any new characters so that a character is replaced by itself.

rules <- c(a = "X", b = "Y", X = "a")
chars <- c("a", "Y", "Z")
no_rule <- chars[! chars %in% names(rules)]
rules2 <- c(rules, setNames(no_rule, no_rule))
rules2[chars]
#>   a   Y   Z 
#> "X" "Y" "Z"

And that's the logic behind the following function.

  • Break strings to characters
  • Create a full list of replacement rules
  • Look up replacement values
  • Glue strings back together
library(stringr)

str_replace_chars <- function(string, rules) {
  # Expand rules to replace characters with themselves 
  # if those characters do not have a replacement rule
  chars <- unique(unlist(strsplit(string, "")))
  complete_rules <- setNames(chars, chars)
  complete_rules[names(rules)] <- rules

  # Split each string into characters, replace and unsplit
  for (string_i in seq_along(string)) {
    chars_i <- unlist(strsplit(string[string_i], ""))
    string[string_i] <- paste0(complete_rules[chars_i], collapse = "")
  }
  string
}

rules <- c(a = "X", p = "e", e = "p")
string <- c("application", "developer")
str_replace_chars(string, rules)
#> [1] "XeelicXtion" "dpvploepr"


来源:https://stackoverflow.com/questions/48169135/str-replace-all-replacing-named-vector-elements-iteratively-not-all-at-once

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!