can the value.var in dcast be a list or have multiple value variables?

前端 未结 3 1102
醉酒成梦
醉酒成梦 2020-12-01 06:16

In the help files for dcast.data.table, there is a note stating that a new feature has been implemented: \"dcast.data.table allows value.var column to be of typ

相关标签:
3条回答
  • 2020-12-01 06:46

    Update

    Apparently, the fix was much easier...


    Technically, your statement that "apparently there is no such feature" isn't quite correct. There is such a feature in the recast function (which sort of hides the melting and casting process), but it seems like Hadley forgot to finish the function or something: the function returns a list of the relevant parts of your operation.

    Here's a minimal example...

    Some sample data:

    set.seed(1)
    mydf <- data.frame(x1 = rep(1:3, each = 3),
                       x2 = rep(1:3, 3),
                       salt = sample(10, 9, TRUE),
                       sugar = sample(7, 9, TRUE))
    
    mydf
    #   x1 x2 salt sugar
    # 1  1  1    3     1
    # 2  1  2    4     2
    # 3  1  3    6     2
    # 4  2  1   10     5
    # 5  2  2    3     3
    # 6  2  3    9     6
    # 7  3  1   10     4
    # 8  3  2    7     6
    # 9  3  3    7     7
    

    The effect you seem to be trying to achieve:

    reshape(mydf, idvar='x1', timevar='x2', direction='wide')
    #   x1 salt.1 sugar.1 salt.2 sugar.2 salt.3 sugar.3
    # 1  1      3       1      4       2      6       2
    # 4  2     10       5      3       3      9       6
    # 7  3     10       4      7       6      7       7
    

    recast in action. (Note that the values are all what we would expect in the dimensions we would expect it.)

    library(reshape2)
    out <- recast(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar"))
    ### recast(mydf, x1 ~ x2 + variable, id.var = c("x1", "x2"))
    out
    # $data
    #      [,1] [,2] [,3] [,4] [,5] [,6]
    # [1,]    3    1    4    2    6    2
    # [2,]   10    5    3    3    9    6
    # [3,]   10    4    7    6    7    7
    # 
    # $labels
    # $labels[[1]]
    #   x1
    # 1  1
    # 2  2
    # 3  3
    # 
    # $labels[[2]]
    #   x2 variable
    # 1  1     salt
    # 2  1    sugar
    # 3  2     salt
    # 4  2    sugar
    # 5  3     salt
    # 6  3    sugar
    

    I'm honestly not sure if this was an incomplete function, or if it is a helper function to another function.

    All of the information is there to be able to put the data back together again, making it easy to write a function like this:

    recast2 <- function(...) {
      inList <- recast(...)
      setNames(cbind(inList[[2]][[1]], inList[[1]]),
               c(names(inList[[2]][[1]]), 
                 do.call(paste, c(rev(inList[[2]][[2]]), sep = "_"))))
    }
    recast2(mydf, x1 ~ x2 + variable, measure.var = c("salt", "sugar"))
    #   x1 salt_1 sugar_1 salt_2 sugar_2 salt_3 sugar_3
    # 1  1      3       1      4       2      6       2
    # 2  2     10       5      3       3      9       6
    # 3  3     10       4      7       6      7       7
    

    Again, a possible advantage with the recast2 approach is the ability to aggregate as well as reshape in the same step.

    0 讨论(0)
  • 2020-12-01 06:53

    Using sample data frame mydf from A5C1D2H2I1M1N2O1R2T1's answer.

    Edit December 2016 using tidyr

    Reshape2 has been replaced with the tidyr package.

    library(tidyr)
    mydf  %>% 
        gather(variable, value, -x1, -x2)  %>% 
        unite(x2_variable, x2, variable)  %>% 
        spread(x2_variable, value)
    
    #   x1 1_salt 1_sugar 2_salt 2_sugar 3_salt 3_sugar
    # 1  1      3       1      4       2      6       2
    # 2  2     10       5      3       3      9       6
    # 3  3     10       4      7       6      7       7
    

    Original answer based on reshape2

    @AlexR added to his question:

    Sure, you can 'melt' the 2 value variables into a single column,

    For those who come here looking for an answer based on reshape2, here is how to melt the data and then use dcast based on the "variable". .

    dt2 <- melt(mydf, id = c("x1", "x2")) 
    

    The variable column will now contain 'var1','var2','var3'. You can achieve the desired effect with

    dt3 <- dcast(dt2, x1 ~ x2 + variable, value.var="value")
    dt3
    #   x1 1_salt 1_sugar 2_salt 2_sugar 3_salt 3_sugar
    # 1  1      3       1      4       2      6       2
    # 2  2     10       5      3       3      9       6
    # 3  3     10       4      7       6      7       7
    

    value.var is optional in this function call as dcast will automatically guess it.

    0 讨论(0)
  • 2020-12-01 06:59

    From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

    Here's how we could use dcast:

    dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
    #    x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
    # 1:  1      3      4      6       1       2       2
    # 2:  2     10      3      9       5       3       6
    # 3:  3     10      7      7       4       6       7
    
    0 讨论(0)
提交回复
热议问题