How to split one column into different columns with dcast without aggregating?

后端 未结 1 2047
谎友^
谎友^ 2020-12-20 06:03

I\'m trying to reshape my data using dcast. I\'m working with samples where each sample has 10-30 sample units. I can\'t have my data aggregate.

My data is in this f

相关标签:
1条回答
  • 2020-12-20 06:56

    You can do this with dcast() but you have to add row numbers for each ID.

    The data.table package is another package besides reshape2 which implements dcast(). data.table has a handy rowid() function to generate unique row ids within each group. WIth that, we get:

    library(data.table)
    dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
    #    ID sample_1 sample_2 sample_3
    # 1:  1        1        2        0
    # 2:  2        0        1        0
    # 3:  3        2        2        1
    # 4:  4        1        0        2
    # 5:  5        0        0        1
    # 6:  6        0        0        0
    # 7:  7        2        1        2
    # 8:  8        1        2        1
    # 9:  9        0        1        4
    #10: 10        2        4        5
    #11: 11        1        5        1
    #12: 12        4        2        1
    #13: 13       NA        1        0
    #14: 14       NA       NA        0
    #15: 15       NA       NA        1
    

    However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,

    # count observations by group
    DF[, .N, by = ID]
    #         ID  N
    #1: sample_1 12
    #2: sample_2 13
    #3: sample_3 15
    
    # compute mean by group
    DF[, mean(total), by = ID]
    #         ID       V1
    #1: sample_1 1.166667
    #2: sample_2 1.615385
    #3: sample_3 1.266667
    
    # get min and max by group
    DF[, .(min = min(total), max = max(total)), by = ID]
    #         ID min max
    #1: sample_1   0   4
    #2: sample_2   0   5
    #3: sample_3   0   5
    
    # the same using range()
    DF[, as.list(range(total)), by = ID]
    #         ID V1 V2
    #1: sample_1  0  4
    #2: sample_2  0  5
    #3: sample_3  0  5
    

    Data

    DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1", 
    "sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1", 
    "sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2", 
    "sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2", 
    "sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3", 
    "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", 
    "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
    ), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L, 
    2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L, 
    2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID", 
    "total"), row.names = c(NA, -40L), class = "data.frame")
    
    0 讨论(0)
提交回复
热议问题