How to split one column into different columns with dcast without aggregating?

后端未结

关注

 1  2051

I\'m trying to reshape my data using dcast. I\'m working with samples where each sample has 10-30 sample units. I can\'t have my data aggregate.

My data is in this f

相关标签:

1条回答

滥情空心

2020-12-20 06:56

You can do this with dcast() but you have to add row numbers for each ID.

The data.table package is another package besides reshape2 which implements dcast(). data.table has a handy rowid() function to generate unique row ids within each group. WIth that, we get:

library(data.table)
dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
#    ID sample_1 sample_2 sample_3
# 1:  1        1        2        0
# 2:  2        0        1        0
# 3:  3        2        2        1
# 4:  4        1        0        2
# 5:  5        0        0        1
# 6:  6        0        0        0
# 7:  7        2        1        2
# 8:  8        1        2        1
# 9:  9        0        1        4
#10: 10        2        4        5
#11: 11        1        5        1
#12: 12        4        2        1
#13: 13       NA        1        0
#14: 14       NA       NA        0
#15: 15       NA       NA        1

However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,

# count observations by group
DF[, .N, by = ID]
#         ID  N
#1: sample_1 12
#2: sample_2 13
#3: sample_3 15

# compute mean by group
DF[, mean(total), by = ID]
#         ID       V1
#1: sample_1 1.166667
#2: sample_2 1.615385
#3: sample_3 1.266667

# get min and max by group
DF[, .(min = min(total), max = max(total)), by = ID]
#         ID min max
#1: sample_1   0   4
#2: sample_2   0   5
#3: sample_3   0   5

# the same using range()
DF[, as.list(range(total)), by = ID]
#         ID V1 V2
#1: sample_1  0  4
#2: sample_2  0  5
#3: sample_3  0  5

Data

DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L, 
2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L, 
2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID", 
"total"), row.names = c(NA, -40L), class = "data.frame")

0 讨论(0)