several substitutions in one line R

前端 未结 3 1463
终归单人心
终归单人心 2020-12-10 13:07

I have a column in a dataframe in R with values \"-1\",\"0\",\"1\". I\'d like to replace these values with \"no\", \"maybe\" and \"yes\" respectively. I\'ll do this by usi

相关标签:
3条回答
  • 2020-12-10 14:00

    By adding 2 to -1, 0, and 1, you could get indices into a vector of the desired outcomes:

    c("no", "maybe", "yes")[dat + 2]
    # [1] "no"    "yes"   "maybe" "yes"   "yes"   "no"  
    

    A related option could make use of the match function to figure out the indexing:

    c("no", "maybe", "yes")[match(dat, -1:1)]
    # [1] "no"    "yes"   "maybe" "yes"   "yes"   "no"  
    

    Alternately, you could use a named vector for recoding:

    unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
    # [1] "no"    "yes"   "maybe" "yes"   "yes"   "no"   
    

    You could also use a nested ifelse:

    ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
    # [1] "no"    "yes"   "maybe" "yes"   "yes"   "no"   
    

    If you don't mind loading a new package, the Recode function from the car package does this:

    library(car)
    Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
    # [1] "no"    "yes"   "maybe" "yes"   "yes"   "no"  
    

    Data:

    dat <- c(-1, 1, 0, 1, 1, -1)
    

    Note that all but the first will work if dat were stored as a string; in the first you would need to use as.numeric(dat).

    If code clarity is your main objective, then you should pick the one that you find easiest to understand -- I would personally pick the second or last but that is personal preference.

    If code speed is of interest, then you can benchmark the solutions. Here's the benchmarks of the five options I've presented, also including the two other solutions currently posted as other answers, benchmarked on a random vector of length 100k:

    set.seed(144)
    dat <- sample(c(-1, 0, 1), replace=TRUE, 100000)
    opt1 <- function(dat) c("no", "maybe", "yes")[dat + 2]
    opt2 <- function(dat) c("no", "maybe", "yes")[match(dat, -1:1)]
    opt3 <- function(dat) unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
    opt4 <- function(dat) ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
    opt5 <- function(dat) Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
    AnandaMahto <- function(dat) factor(dat, levels = c(-1, 0, 1), labels = c("no", "maybe", "yes"))
    hrbrmstr <- function(dat) sapply(as.character(dat), switch, `-1`="no", `0`="maybe", `1`="yes", USE.NAMES=FALSE)
    library(microbenchmark)
    microbenchmark(opt1(dat), opt2(dat), opt3(dat), opt4(dat), opt5(dat), AnandaMahto(dat), hrbrmstr(dat))
    # Unit: milliseconds
    #              expr        min         lq       mean     median         uq        max neval
    #         opt1(dat)   1.513500   2.553022   2.763685   2.656010   2.837673   4.384149   100
    #         opt2(dat)   2.153438   3.013502   3.251850   3.117058   3.269230   5.851234   100
    #         opt3(dat)  59.716271  61.890470  64.978685  62.509046  63.723048 144.708757   100
    #         opt4(dat)  62.934734  64.715815  71.181477  65.652195  71.123384 123.840577   100
    #         opt5(dat)  82.976441  84.849147  89.071808  85.752429  88.473162 155.347273   100
    #  AnandaMahto(dat)  57.267227  58.643889  60.508402  59.065642  60.368913  80.852157   100
    #     hrbrmstr(dat) 137.883307 148.626496 158.051220 153.441243 162.594752 228.271336   100
    

    The first two options appear to be more than an order of magnitude quicker than any of the other options, though either the vector would have to be pretty huge or you would need to be repeating the operation a number of times for any of this to make a difference.

    As pointed out by @AnandaMahto, these results are qualitatively different if we have character input instead of numeric input:

    set.seed(144)
    dat <- sample(c("-1", "0", "1"), replace=TRUE, 100000)
    opt1 <- function(dat) c("no", "maybe", "yes")[as.numeric(dat) + 2]
    opt2 <- function(dat) c("no", "maybe", "yes")[match(dat, -1:1)]
    opt3 <- function(dat) unname(c("-1"="no", "0"="maybe", "1"="yes")[as.character(dat)])
    opt4 <- function(dat) ifelse(dat == -1, "no", ifelse(dat == 0, "maybe", "yes"))
    opt5 <- function(dat) Recode(dat, "-1='no'; 0='maybe'; 1='yes'")
    AnandaMahto <- function(dat) factor(dat, levels = c(-1, 0, 1), labels = c("no", "maybe", "yes"))
    hrbrmstr <- function(dat) sapply(dat, switch, `-1`="no", `0`="maybe", `1`="yes", USE.NAMES=FALSE)
    library(microbenchmark)
    microbenchmark(opt1(dat), opt2(dat), opt3(dat), opt4(dat), opt5(dat), AnandaMahto(dat), hrbrmstr(dat))
    # Unit: milliseconds
    #              expr       min        lq       mean     median         uq        max neval
    #         opt1(dat)  8.397194  9.519075  10.784108   9.693706  10.163203   55.78417   100
    #         opt2(dat)  2.281438  3.091418   4.231162   3.210794   3.436038   49.39879   100
    #         opt3(dat)  3.606863  5.481115   6.466393   5.720282   6.344651   48.47924   100
    #         opt4(dat) 66.819638 69.996704  74.596960  71.290522  73.404043  127.52415   100
    #         opt5(dat) 32.897019 35.701401  38.488489  36.336489  38.950272   88.20915   100
    #  AnandaMahto(dat)  1.329443  2.114504   2.824306   2.275736   2.493907   46.19333   100
    #     hrbrmstr(dat) 81.898572 91.043729 154.331766 100.006203 141.425717 1594.17447   100
    

    Now, the factor solution proposed by @AnandaMahto is the quickest, followed by vector indexing with match and named vector lookup. Again, all runtimes are fast enough that you would need a large vector or many runs for any of this to matter.

    0 讨论(0)
  • 2020-12-10 14:07

    factor is commonly used for this type of task, and leads to some pretty easily readable code:

    vec <- c(0, 1, -1, -1, 1, 0)
    vec
    # [1]  0  1 -1 -1  1  0
    
    factor(vec, levels = c(-1, 0, 1), labels = c("no", "maybe", "yes"))
    # [1] maybe yes   no    no    yes   maybe
    # Levels: no maybe yes
    

    If you want just the character output, wrap it in as.character.


    If the column values are already strings, you just modify the levels argument in factor to use as.character:

    vec2 <- as.character(c(0, 1, -1, -1, 1, 0))
    vec2
    # [1] "0"  "1"  "-1" "-1" "1"  "0" 
    
    factor(vec2, levels = as.character(c(-1, 0, 1)), labels = c("no", "maybe", "yes"))
    # [1] maybe yes   no    no    yes   maybe
    # Levels: no maybe yes
    
    0 讨论(0)
  • 2020-12-10 14:07

    This could also be an evil application for switch:

    set.seed(1492)
    thing <- sample(c(-1, 0, 1), 100, replace=TRUE)
    sapply(as.character(thing), switch, `-1`="no", `0`="maybe", `1`="yes", USE.NAMES=FALSE))
    

    If they are in fact characters already you can leave off the as.character() bit.

    NOTE: I'm not necessarily recommending this, just showing all the possible ways (and this is more of a way out of twisty mazes of ifelse passages).

    IMO factors are the way to go.

    0 讨论(0)
提交回复
热议问题