R + reshape : variance of columns of a data.frame

前提是你 提交于 2020-01-03 02:29:10

问题


I'm using reshape in R to compute aggregate statistics over columns of a data.frame. Here's my data.frame:

> df
  a a b b ID
1 1 1 1 1  1
2 2 3 2 3  2
3 3 5 3 5  3

which is just a little test data.frame to try and understand the reshape package. I melt, and then cast, to try and find the mean of the as and the bs:

> melt(df, id = "ID") -> df.m
> cast(df.m, ID ~ variable, fun = mean)
  ID a b
1  1 1 1
2  2 2 2
3  3 3 3

Argh! What? Was hoping the mean of c(2,3) was 2.5 and so on. What's going on? Here's a thing:

> df.m
   ID variable value
1   1        a     1
2   2        a     2
3   3        a     3
4   1        a     1
5   2        a     2
6   3        a     3
7   1        b     1
8   2        b     2
9   3        b     3
10  1        b     1
11  2        b     2
12  3        b     3

what's going on? Where did both my 5s go? Do I have a very basic misunderstanding going on here? If so: what is it?


回答1:


I updated my answer here to fix this: R: aggregate columns of a data.frame

Apparently, if your data frame doesn't have unique column names, they won't melt properly.

Edit: Instead of having column names of a a a b b, apparently you need to have unique column names for melt() to work properly. Minimally a.1 a.2 a.3 b.1 b.2, or something. After using melt(), your options to get sensible levels for variable is either to use gsub() on the levels of variable to eliminate the disambiguating values, or to use colsplit() to create two new columns. For the dummy names I just gave, that would look like:

levels(df.m$variable) <- gsub("\\..*", "", levels(df.m$variable))
#or
df.m <- cbind(df.m, colsplit(df.m$variable, split = "\\.", names = c("Measure","N")))



回答2:


This is not a valid data frame because the columns do not have unique names.



来源:https://stackoverflow.com/questions/3356923/r-reshape-variance-of-columns-of-a-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!