Using melt / cast with variables of uneven length in R

◇◆丶佛笑我妖孽 提交于 2019-12-12 12:22:46

问题


I'm working with a large data frame that I want to pivot, so that variables in a column become rows across the top.

I've found the reshape package very useful in such cases, except that the cast function defaults to fun.aggregate=length. Presumably this is because I'm performing these operations by "case" and the number of variables measured varies among cases.

I would like to pivot so that missing variables are denoted as "NA"s in the pivoted data frame.

So, in other words, I want to go from a molten data frame like this:

Case | Variable | Value
 1         1        2.3
 1         2        2.1
 1         3        1.3
 2         1        4.3
 2         2        2.5
 3         1        1.8
 3         2        1.9
 3         3        2.3
 3         4        2.2

To something like this:

Case | Variable 1 | Variable 2 | Variable 3 | Variable 4
 1         2.3          2.1          1.3         NA
 2         4.3          2.5          NA          NA
 3         1.8          1.9          2.3         2.2 

The code dcast(data,...~Variable) again defaults to fun.aggregate=length, which does not preserve the original values.

Thanks for your help, and let me know if anything is unclear!


回答1:


It is just a matter of including all of the variables in the cast call. Reshape expects the Value column to be called value, so it throws a warning, but still works fine. The reason that it was using fun.aggregate=length is because of the missing Case in the formula. It was aggregating over the values in Case.

Try: cast(data, Case~Variable)

data <- data.frame(Case=c(1,1,1,2,2,3,3,3,3),
  Variable=c(1,2,3,1,2,1,2,3,4),
  Value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))

cast(data,Case~Variable)
Using Value as value column.  Use the value argument to cast to override this choice
  Case   1   2   3   4
1    1 2.3 2.1 1.3  NA
2    2 4.3 2.5  NA  NA
3    3 1.8 1.9 2.3 2.2

Edit: as a response to the comment from @Jon. What do you do if there is one more variable in the data frame?

data <- data.frame(expt=c(1,1,1,1,2,2,2,2,2),
               func=c(1,1,1,2,2,3,3,3,3),
               variable=c(1,2,3,1,2,1,2,3,4),
               value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))

cast(data,expt+variable~func)
  expt variable   1   2   3
1    1        1 2.3 4.3  NA
2    1        2 2.1  NA  NA
3    1        3 1.3  NA  NA
4    2        1  NA  NA 1.8
5    2        2  NA 2.5 1.9
6    2        3  NA  NA 2.3
7    2        4  NA  NA 2.2



回答2:


Here is one solution. It does not use the package or function you mention, but it could be of use. Suppose your data frame is called df:

M <- matrix(NA,
            nrow = length(unique(df$Case)),
            ncol = length(unique(df$Variable))+1,
            dimnames = list(NULL,c('Case',paste('Variable',sort(unique(df$Variable))))))
irow <- match(df$Case,unique(df$Case))
icol <- match(df$Variable,unique(df$Variable)) + 1
ientry <- irow + (icol-1)*nrow(M)
M[ientry] <- df$Value
M[,1] <- unique(df$Case)



回答3:


To avoid the warning message, you could subset the data frame according to another variable, i.e a categorical variable having three levels a,b,c. Because in you current data for category a it has 70 cases, for b 80 cases, c has 90. Then the cast function doesn't know how to aggregate them.

Hope this helps.



来源:https://stackoverflow.com/questions/6391470/using-melt-cast-with-variables-of-uneven-length-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!