问题
I'm working with a large data frame that I want to pivot, so that variables in a column become rows across the top.
I've found the reshape package very useful in such cases, except that the cast function defaults to fun.aggregate=length. Presumably this is because I'm performing these operations by "case" and the number of variables measured varies among cases.
I would like to pivot so that missing variables are denoted as "NA"s in the pivoted data frame.
So, in other words, I want to go from a molten data frame like this:
Case | Variable | Value
1 1 2.3
1 2 2.1
1 3 1.3
2 1 4.3
2 2 2.5
3 1 1.8
3 2 1.9
3 3 2.3
3 4 2.2
To something like this:
Case | Variable 1 | Variable 2 | Variable 3 | Variable 4
1 2.3 2.1 1.3 NA
2 4.3 2.5 NA NA
3 1.8 1.9 2.3 2.2
The code dcast(data,...~Variable) again defaults to fun.aggregate=length, which does not preserve the original values.
Thanks for your help, and let me know if anything is unclear!
回答1:
It is just a matter of including all of the variables in the cast
call. Reshape expects the Value
column to be called value
, so it throws a warning, but still works fine. The reason that it was using fun.aggregate=length
is because of the missing Case
in the formula. It was aggregating over the values in Case
.
Try: cast(data, Case~Variable)
data <- data.frame(Case=c(1,1,1,2,2,3,3,3,3),
Variable=c(1,2,3,1,2,1,2,3,4),
Value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))
cast(data,Case~Variable)
Using Value as value column. Use the value argument to cast to override this choice
Case 1 2 3 4
1 1 2.3 2.1 1.3 NA
2 2 4.3 2.5 NA NA
3 3 1.8 1.9 2.3 2.2
Edit: as a response to the comment from @Jon. What do you do if there is one more variable in the data frame?
data <- data.frame(expt=c(1,1,1,1,2,2,2,2,2),
func=c(1,1,1,2,2,3,3,3,3),
variable=c(1,2,3,1,2,1,2,3,4),
value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))
cast(data,expt+variable~func)
expt variable 1 2 3
1 1 1 2.3 4.3 NA
2 1 2 2.1 NA NA
3 1 3 1.3 NA NA
4 2 1 NA NA 1.8
5 2 2 NA 2.5 1.9
6 2 3 NA NA 2.3
7 2 4 NA NA 2.2
回答2:
Here is one solution. It does not use the package or function you mention, but it could be of use. Suppose your data frame is called df
:
M <- matrix(NA,
nrow = length(unique(df$Case)),
ncol = length(unique(df$Variable))+1,
dimnames = list(NULL,c('Case',paste('Variable',sort(unique(df$Variable))))))
irow <- match(df$Case,unique(df$Case))
icol <- match(df$Variable,unique(df$Variable)) + 1
ientry <- irow + (icol-1)*nrow(M)
M[ientry] <- df$Value
M[,1] <- unique(df$Case)
回答3:
To avoid the warning message, you could subset the data frame according to another variable, i.e a categorical variable having three levels a,b,c. Because in you current data for category a it has 70 cases, for b 80 cases, c has 90. Then the cast function doesn't know how to aggregate them.
Hope this helps.
来源:https://stackoverflow.com/questions/6391470/using-melt-cast-with-variables-of-uneven-length-in-r