问题
I have a tall data frame as such:
data = data.frame("id"=c(1,2,3,4,5,6,7,8,9,10),
"group"=c(1,1,2,1,2,2,2,2,1,2),
"type"=c(1,1,2,3,2,2,3,3,3,1),
"score1"=c(sample(1:4,10,r=T)),
"score2"=c(sample(1:4,10,r=T)),
"score3"=c(sample(1:4,10,r=T)),
"score4"=c(sample(1:4,10,r=T)),
"score5"=c(sample(1:4,10,r=T)),
"weight1"=c(173,109,136,189,186,146,173,102,178,174),
"weight2"=c(147,187,125,126,120,165,142,129,144,197),
"weight3"=c(103,192,102,159,128,179,195,193,135,145),
"weight4"=c(114,182,199,101,111,116,198,123,119,181),
"weight5"=c(159,125,104,171,166,154,197,124,180,154))
library(reshape2)
library(plyr)
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
data1 <- data1[order(data1$id), ]
And what I want to create is a new data frame like so:
want = data.frame("score"=rep(1:4,6),
"group"=rep(1:2,12),
"type"=rep(1:3,8),
"weightedCOUNT"=NA) # how to calculate this? count(data1, score, wt = weight)
I am just not sure how to calculate weightedCOUNT which should apply the weights to the score variable so then it gives in column 'weightedCOUNT' a weighted count that is aggregated by score and group and type.
回答1:
An option would be to melt
(from data.table
- which can take multiple measure
patterns
, and then grouped by 'group', 'type' get the count
library(data.table)
library(dplyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
count(score, wt = weight)
If we need to have a complete
set of combinations
library(tidyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
ungroup %>%
complete(group, type, score, fill = list(n = 0))
回答2:
If I understand correctly, weightedCOUNT
is the sum of weights grouped by score
, group
, and type
.
For the sake of completeness, I would like to show how the accepted solution would look like when implemented in pure base R and pure data.table
syntax, resp.
Base R
The OP was almost there. He has already reshaped data
from wide to long format for multiple value variables. Only the final aggregation step was missing:
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
result <- aggregate(weight ~ score + group + type, data1, FUN = sum)
result
score group type weight 1 1 1 1 479 2 3 1 1 558 3 4 1 1 454 4 1 2 1 378 5 2 2 1 154 6 3 2 1 174 7 4 2 1 145 8 1 2 2 535 9 2 2 2 855 10 3 2 2 248 11 4 2 2 499 12 1 1 3 189 13 2 1 3 351 14 3 1 3 600 15 4 1 3 362 16 1 2 3 596 17 2 2 3 265 18 3 2 3 193 19 4 2 3 522
result
can be reordered by
with(result, result[order(score, group, type), ])
score group type weight 1 1 1 1 479 12 1 1 3 189 4 1 2 1 378 8 1 2 2 535 16 1 2 3 596 13 2 1 3 351 5 2 2 1 154 9 2 2 2 855 17 2 2 3 265 2 3 1 1 558 14 3 1 3 600 6 3 2 1 174 10 3 2 2 248 18 3 2 3 193 3 4 1 1 454 15 4 1 3 362 7 4 2 1 145 11 4 2 2 499 19 4 2 3 522
data.table
As shown by akrun, melt()
from the data.table
package can be combined with dplyr
. Alternatively, we can stay with the data.table
syntax for aggregation:
library(data.table)
cols <- c("score", "weight") # to save typing
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)]
score group type weightedCOUNT 1: 1 1 1 479 2: 1 1 3 189 3: 1 2 1 378 4: 1 2 2 535 5: 1 2 3 596 6: 2 1 3 351 7: 2 2 1 154 8: 2 2 2 855 9: 2 2 3 265 10: 3 1 1 558 11: 3 1 3 600 12: 3 2 1 174 13: 3 2 2 248 14: 3 2 3 193 15: 4 1 1 454 16: 4 1 3 362 17: 4 2 1 145 18: 4 2 2 499 19: 4 2 3 522
The keyby
parameter is used for grouping and ordering the output in one step.
Completion of missing combinations of the grouping variables is also possible in data.table
syntax using the cross join function CJ()
:
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)][
CJ(score, group, type, unique = TRUE), on = .(score, group, type)][
is.na(weightedCOUNT), weightedCOUNT := 0][]
score group type weightedCOUNT 1: 1 1 1 479 2: 1 1 2 0 3: 1 1 3 189 4: 1 2 1 378 5: 1 2 2 535 6: 1 2 3 596 7: 2 1 1 0 8: 2 1 2 0 9: 2 1 3 351 10: 2 2 1 154 11: 2 2 2 855 12: 2 2 3 265 13: 3 1 1 558 14: 3 1 2 0 15: 3 1 3 600 16: 3 2 1 174 17: 3 2 2 248 18: 3 2 3 193 19: 4 1 1 454 20: 4 1 2 0 21: 4 1 3 362 22: 4 2 1 145 23: 4 2 2 499 24: 4 2 3 522 score group type weightedCOUNT
来源:https://stackoverflow.com/questions/56411704/r-loop-to-new-data-frame-summary-weighted