generate column values with multiple conditions in R

后端 未结 8 2000
甜味超标
甜味超标 2020-12-29 00:22

I have a dataframe z and I want to create the new column based on the values of two old columns of z. Following is the process:

&g         


        
相关标签:
8条回答
  • 2020-12-29 00:51

    Generate a multipler vector:

    tt <- rep(1, max(z$x))
    tt[2] <- 2
    tt[4] <- 4
    tt[7] <- 3
    

    And here is your new column:

    > z$t * tt[z$x]
     [1] 21 44 23 96 25 26 81 28 29 30
    
    > z$q <- z$t * tt[z$x]
    > z
        x  y  t  q
    1   1 11 21 21
    2   2 12 22 44
    3   3 13 23 23
    4   4 14 24 96
    5   5 15 25 25
    6   6 16 26 26
    7   7 17 27 81
    8   8 18 28 28
    9   9 19 29 29
    10 10 20 30 30
    

    This will not work if there are negative values in z$x.

    Edited

    Here is a generalization of the above, where a function is used to generate the multiplier vector. In fact, we create a function based on parameters.

    We want to transform the following values:

    2 -> 2
    4 -> 4
    7 -> 3
    

    Otherwise a default of 1 is taken.

    Here is a function which generates the desired function:

    f <- function(default, x, y) {
      x.min <- min(x)
      x.max <- max(x)
      y.vals <- rep(default, x.max-x.min+1)
      y.vals[x-x.min+1] <- y
    
      function(z) {
        result <- rep(default, length(z))
        tmp <- z>=x.min & z<=x.max
        result[tmp] <- y.vals[z[tmp]-x.min+1]
        result
      }
    }
    

    Here is how we use it:

    x <- c(2,4,7)
    y <- c(2,4,3)
    
    g <- f(1, x, y)
    

    g is the function that we want. It should be clear that any mapping can be supplied via the x and y parameters to f.

    g(z$x)
    ## [1] 1 2 1 4 1 1 3 1 1 1
    
    g(z$x)*z$t
    ## [1] 21 44 23 96 25 26 81 28 29 30
    

    It should be clear this only works for integer values.

    0 讨论(0)
  • 2020-12-29 00:56

    Here's a version of an SQL decode in R for character vectors (untested with factors) that operates just like the SQL version. i.e. it takes an arbitrary number of target/replacement pairs, and optional last argument that acts as a default value (note that the default won't overwrite NAs).

    I can see it being pretty useful in conjunction with dplyr's mutate operation.

    > x <- c("apple","apple","orange","pear","pear",NA)
    
    > decode(x, apple, banana)
    [1] "banana" "banana" "orange" "pear"   "pear"   NA      
    
    > decode(x, apple, banana, fruit)
    [1] "banana" "banana" "fruit"  "fruit"  "fruit"  NA      
    
    > decode(x, apple, banana, pear, passionfruit)
    [1] "banana"       "banana"       "orange"       "passionfruit" "passionfruit" NA            
    
    > decode(x, apple, banana, pear, passionfruit, fruit)
    [1] "banana"       "banana"       "fruit"        "passionfruit" "passionfruit" NA  
    

    Here's the code I'm using, with a gist I'll keep up to date here (link).

    decode <- function(x, ...) {
    
      args <- as.character((eval(substitute(alist(...))))
    
      replacements <- args[1:length(args) %% 2 == 0]
      targets      <- args[1:length(args) %% 2 == 1][1:length(replacements)]
    
      if(length(args) %% 2 == 1)
        x[! x %in% targets & ! is.na(x)] <- tail(args,1)
    
      for(i in 1:length(targets))
        x <- ifelse(x == targets[i], replacements[i], x)
    
      return(x)
    
    }
    
    0 讨论(0)
提交回复
热议问题