问题
Whis this data frame,
df <- expand.grid(id="01", parameter=c("blood", "saliva"), visit=c("V1", "V2", "V3"))
df$value <- c(1:6)
df$sex <- rep("f", 6)
df
> df
id parameter visit value sex
1 01 blood V1 1 f
2 01 saliva V1 2 f
3 01 blood V2 3 f
4 01 saliva V2 4 f
5 01 blood V3 5 f
6 01 saliva V3 6 f
When I reshape it in the "wide" format, I get identical results with both the base reshape
function and the dcast
function from reshape2
.
reshape(df,
timevar="visit",
idvar=c("id", "parameter", "sex"),
direction="wide")
id parameter sex value.V1 value.V2 value.V3
1 01 blood f 1 3 5
2 01 saliva f 2 4 6
library(reshape2)
dcast(df,
id+parameter+sex~visit,
value.var="value")
id parameter sex V1 V2 V3
1 01 blood f 1 3 5
2 01 saliva f 2 4 6
But if I add some missing values, the results differs
df$value <- c(1,2,NA,NA,NA,NA)
df$sex <- c(NA,NA,NA,NA,NA,NA)
df
> df
id parameter visit value sex
1 01 blood V1 1 NA
2 01 saliva V1 2 NA
3 01 blood V2 NA NA
4 01 saliva V2 NA NA
5 01 blood V3 NA NA
6 01 saliva V3 NA NA
With base reshape
, I get only one row
reshape(df,
timevar="visit",
idvar=c("id", "parameter", "sex"),
direction="wide")
id parameter sex value.V1 value.V2 value.V3
1 01 blood NA 1 NA NA
With dcast
, I get two rows
dcast(df,
id+parameter+sex~visit,
value.var="value")
id parameter sex V1 V2 V3
1 01 blood NA 1 NA NA
2 01 saliva NA 2 NA NA
Is there a way to handle these missing values in the base reshape
function, as I'd like to use this one?
回答1:
The relevant part of the reshape
code would be the line:
data[, tempidname] <- interaction(data[, idvar], drop = TRUE)
Look at how interaction
works:
> interaction("A", "B")
[1] A.B
Levels: A.B
> interaction("A", "B", NA)
[1] <NA>
Levels:
But, compare what would happen if NA
were retained as a level
:
> interaction("A", "B", addNA(NA))
[1] A.B.NA
Levels: A.B.NA
Thus, if you want to have the same result with base R's reshape
, you need to make sure that any "idvar" columns have NA
retained as a level.
Example:
df$sex <- addNA(df$sex)
reshape(df,
timevar="visit",
idvar=c("id", "parameter", "sex"),
direction="wide")
# id parameter sex value.V1 value.V2 value.V3
# 1 01 blood <NA> 1 NA NA
# 2 01 saliva <NA> 2 NA NA
Of course, the other question is how NA
can be treated as an identifying variable :-)
来源:https://stackoverflow.com/questions/34894358/reshape-from-base-vs-dcast-from-reshape2-with-missing-values