问题
I'm trying to reshape my data using dcast. I'm working with samples where each sample has 10-30 sample units. I can't have my data aggregate.
My data is in this format:
ID total
sample_1 1
sample_1 0
sample_1 2
sample_1 1
sample_1 0
sample_1 0
sample_1 2
sample_1 1
sample_1 0
sample_1 2
sample_1 1
sample_1 4
sample_2 2
sample_2 1
sample_2 2
sample_2 0
sample_2 0
sample_2 0
sample_2 1
sample_2 2
sample_2 1
sample_2 4
sample_2 5
sample_2 2
sample_2 1
sample_3 0
sample_3 0
sample_3 1
sample_3 2
sample_3 1
sample_3 0
sample_3 2
sample_3 1
sample_3 4
sample_3 5
sample_3 1
sample_3 1
sample_3 0
sample_3 0
sample_3 1
And I want it to looks like it:
sample_1 sample_2 sample_3
1 2 0
0 1 0
2 2 1
1 0 2
0 0 1
0 0 0
2 1 2
1 2 1
0 1 4
2 4 5
1 5 1
4 2 1
1 0
0
1
Where my sample ID's turn into different columns.
I tried in several ways but R keep aggregating it.
回答1:
You can do this with dcast()
but you have to add row numbers for each ID
.
The data.table
package is another package besides reshape2
which implements dcast()
. data.table
has a handy rowid()
function to generate unique row ids within each group. WIth that, we get:
library(data.table)
dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
# ID sample_1 sample_2 sample_3
# 1: 1 1 2 0
# 2: 2 0 1 0
# 3: 3 2 2 1
# 4: 4 1 0 2
# 5: 5 0 0 1
# 6: 6 0 0 0
# 7: 7 2 1 2
# 8: 8 1 2 1
# 9: 9 0 1 4
#10: 10 2 4 5
#11: 11 1 5 1
#12: 12 4 2 1
#13: 13 NA 1 0
#14: 14 NA NA 0
#15: 15 NA NA 1
However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,
# count observations by group
DF[, .N, by = ID]
# ID N
#1: sample_1 12
#2: sample_2 13
#3: sample_3 15
# compute mean by group
DF[, mean(total), by = ID]
# ID V1
#1: sample_1 1.166667
#2: sample_2 1.615385
#3: sample_3 1.266667
# get min and max by group
DF[, .(min = min(total), max = max(total)), by = ID]
# ID min max
#1: sample_1 0 4
#2: sample_2 0 5
#3: sample_3 0 5
# the same using range()
DF[, as.list(range(total)), by = ID]
# ID V1 V2
#1: sample_1 0 4
#2: sample_2 0 5
#3: sample_3 0 5
Data
DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1",
"sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1",
"sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2",
"sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2",
"sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3",
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3",
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L,
2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L,
2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID",
"total"), row.names = c(NA, -40L), class = "data.frame")
来源:https://stackoverflow.com/questions/42914533/how-to-split-one-column-into-different-columns-with-dcast-without-aggregating