There are some questions similar to this topic on SO but not exactly like my usecase. I have a dataset where the columns are laid out as shown below
Id Description Value
10 Cat 19
10 Cat 20
10 Cat 5
10 Cat 13
11 Cat 17
11 Cat 23
11 Cat 7
11 Cat 14
10 Dog 19
10 Dog 20
10 Dog 5
10 Dog 13
11 Dog 17
11 Dog 23
11 Dog 7
11 Dog 14
What I am trying to do is capture the mean of the Value column by Id, Description. The final dataset would look like this.
Id Cat Dog
10 14.25 28.5
11 15.25 15.25
I can do this in a very rough manner not very efficient like this
tempdf1 <- df %>%
filter(str_detect(Description, "Cat")) %>%
group_by(Id, Description) %>%
summarize(Mean_Value = mean(Value) , na.rm = TRUE))
This is not very convenient. Any advise on how how to accomplish the expected results more efficiently is much appreciated.
Use dcast
or even acast
from reshape2()
package
dcast(dat,Id~Description,mean)
Id Cat Dog
1 10 14.25 14.25
2 11 15.25 15.25
Base R
might be abit longer:
reshape(aggregate(.~Id+Description,dat,mean),direction = "wide",v.names = "Value",idvar = "Id",timevar = "Description")
Id Value.Cat Value.Dog
1 10 14.25 14.25
2 11 15.25 15.25
I would do this with tapply
:
with( dat, tapply(Value, list(Id,Description), mean))
Cat Dog
10 14.25 14.25
11 15.25 15.25
Does return a matrix object so don't try accessing with "$".
You can aggregate (calculate average) per groups using data.table
and get wanted table format using dcast()
:
library(data.table)
foo <- setDT(d)[, mean(Value), .(Id, Description)]
# Id Description V1
# 1: 10 Cat 14.25
# 2: 11 Cat 15.25
# 3: 10 Dog 14.25
# 4: 11 Dog 15.25
dcast(foo, Id ~ Description, value.var = "V1")
# Id Cat Dog
# 1: 10 14.25 14.25
# 2: 11 15.25 15.25
You can do the summarise
using dplyr
and the transformation from long to wide using tidyr::spread
:
library(dplyr)
library(tidyr)
df %>%
group_by(Id, Description) %>%
summarise(Mean = mean(Value)) %>%
spread(Description, Mean)
Id Cat Dog
* <int> <dbl> <dbl>
1 10 14.25 14.25
2 11 15.25 15.25
来源:https://stackoverflow.com/questions/47947278/aggregate-by-multiple-columns-and-reshape-from-long-to-wide