Replace multiple strings in one gsub() or chartr() statement in R?

前端 未结 6 1503
遇见更好的自我
遇见更好的自我 2020-11-30 07:28

I have a string variable containing alphabet[a-z], space[ ], and apostrophe[\'],eg. x <- \"a\'b c\" I want to replace apostrophe[\'] with blank[], and replac

相关标签:
6条回答
  • 2020-11-30 08:04

    I am a fan of the syntax that the %<>% and %>% opperators from the magrittr package provide.

    library(magrittr)
    
    x <- "a'b c"
    
    x %<>%
      gsub("'", "", .) %>%
      gsub(" ", "_", .) 
    x
    ##[1] "ab_c"
    

    gusbfn is wonderful, but I like the chaining %>% allows.

    0 讨论(0)
  • 2020-11-30 08:05

    I think nested gsub will do the job.

    gsub("Find","Replace",gsub("Find","Replace",X))
    
    0 讨论(0)
  • 2020-11-30 08:11

    You can use gsubfn

    library(gsubfn)
    gsubfn(".", list("'" = "", " " = "_"), x)
    # [1] "ab_c"
    

    Similarly, we can also use mgsub which allows multiple replacement with multiple pattern to search

    mgsub::mgsub(x, c("'", " "), c("", "_"))
    #[1] "ab_c"
    
    0 讨论(0)
  • 2020-11-30 08:26
    gsub("\\s", "", chartr("' ", " _", x)) # Use whitespace and then remove it
    
    0 讨论(0)
  • 2020-11-30 08:27

    I'd go with the quite fast function stri_replace_all_fixed from library(stringi):

    library(stringi)    
    stri_replace_all_fixed("a'b c", pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE)
    

    Here is a benchmark taking into account most of the other suggested solutions:

    library(stringi)
    library(microbenchmark)
    library(gsubfn)
    library(mgsub)
    library(magrittr)
    library(dplyr)
    
    x_gsubfn <-
    x_mgsub <-
    x_nested_gsub <-
    x_magrittr <-
    x_stringi <- "a'b c"
    
    microbenchmark("gsubfn" = { gsubfn(".", list("'" = "", " " = "_"), x_gsubfn) },
                   "mgsub" = { mgsub::mgsub(x_mgsub, c("'", " "), c("", "_")) },
                   "nested_gsub" = { gsub("Find", "Replace", gsub("Find","Replace", x_nested_gsub)) },
                   "magrittr" = { x_magrittr %<>% gsub("'", "", .) %>% gsub(" ", "_", .) },
                   "stringi" = { stri_replace_all_fixed(x_stringi, pattern = c("'", " "), replacement = c("", "_"), vectorize_all = FALSE) }
                   )
    

    Unit: microseconds
            expr     min       lq      mean   median       uq     max neval
          gsubfn 458.217 482.3130 519.12820 513.3215 538.0100 715.371   100
           mgsub 180.521 200.8650 221.20423 216.0730 231.6755 460.587   100
     nested_gsub  14.615  15.9980  17.92178  17.7760  18.7630  40.687   100
        magrittr 113.765 133.7125 148.48202 142.9950 153.0680 296.261   100
         stringi   3.950   7.7030   8.41780   8.2960   9.0860  26.071   100
    
    0 讨论(0)
  • 2020-11-30 08:29

    I would opt for a magrittr and/or dplyr solution, as well. However, I prefer not making a new copy of the object, especially if it is in a function and can be returned cheaply.

    i.e.

    return(
      catInTheHat %>% gsub('Thing1', 'Thing2', .) %>% gsub('Red Fish', 'Blue 
        Fish', .)
    )
    

    ...and so on.

    0 讨论(0)
提交回复
热议问题