How to split one column into different columns with dcast without aggregating?

喜夏-厌秋 提交于 2019-11-28 06:58:54

问题


I'm trying to reshape my data using dcast. I'm working with samples where each sample has 10-30 sample units. I can't have my data aggregate.

My data is in this format:

ID  total
sample_1    1
sample_1    0
sample_1    2
sample_1    1
sample_1    0
sample_1    0
sample_1    2
sample_1    1
sample_1    0
sample_1    2
sample_1    1
sample_1    4
sample_2    2
sample_2    1
sample_2    2
sample_2    0
sample_2    0
sample_2    0
sample_2    1
sample_2    2
sample_2    1
sample_2    4
sample_2    5
sample_2    2
sample_2    1
sample_3    0
sample_3    0
sample_3    1
sample_3    2
sample_3    1
sample_3    0
sample_3    2
sample_3    1
sample_3    4
sample_3    5
sample_3    1
sample_3    1
sample_3    0
sample_3    0
sample_3    1

And I want it to looks like it:

sample_1    sample_2    sample_3
1           2           0
0           1           0
2           2           1
1           0           2
0           0           1
0           0           0
2           1           2
1           2           1
0           1           4
2           4           5
1           5           1
4           2           1
            1           0
                        0
                        1

Where my sample ID's turn into different columns.

I tried in several ways but R keep aggregating it.


回答1:


You can do this with dcast() but you have to add row numbers for each ID.

The data.table package is another package besides reshape2 which implements dcast(). data.table has a handy rowid() function to generate unique row ids within each group. WIth that, we get:

library(data.table)
dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
#    ID sample_1 sample_2 sample_3
# 1:  1        1        2        0
# 2:  2        0        1        0
# 3:  3        2        2        1
# 4:  4        1        0        2
# 5:  5        0        0        1
# 6:  6        0        0        0
# 7:  7        2        1        2
# 8:  8        1        2        1
# 9:  9        0        1        4
#10: 10        2        4        5
#11: 11        1        5        1
#12: 12        4        2        1
#13: 13       NA        1        0
#14: 14       NA       NA        0
#15: 15       NA       NA        1

However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,

# count observations by group
DF[, .N, by = ID]
#         ID  N
#1: sample_1 12
#2: sample_2 13
#3: sample_3 15

# compute mean by group
DF[, mean(total), by = ID]
#         ID       V1
#1: sample_1 1.166667
#2: sample_2 1.615385
#3: sample_3 1.266667

# get min and max by group
DF[, .(min = min(total), max = max(total)), by = ID]
#         ID min max
#1: sample_1   0   4
#2: sample_2   0   5
#3: sample_3   0   5

# the same using range()
DF[, as.list(range(total)), by = ID]
#         ID V1 V2
#1: sample_1  0  4
#2: sample_2  0  5
#3: sample_3  0  5

Data

DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1", 
"sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2", 
"sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3", 
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L, 
2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L, 
2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID", 
"total"), row.names = c(NA, -40L), class = "data.frame")


来源:https://stackoverflow.com/questions/42914533/how-to-split-one-column-into-different-columns-with-dcast-without-aggregating

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!