Group together two columns with ID, do the cumulative for two columns

寵の児 提交于 2019-12-13 09:29:48

问题


Edit: I wrote the question way to unstructured, let me try again.

I want to create two new columns, winner_total_points and loser_total_points to the dataset below.

winner <- c(1,2,3,4,1,2)
loser <- c(2,3,1,3,3,1)
winner_points <- c(5,4,12,2,1,6)
loser_points <- c(5,2,2,6,6,2)
test_data <- data.frame(winner, loser, winner_points, loser_points)

What I want those two columns to do is that winner_total_points to sum all the points the winner has gotten (excluding this match) as both the winner and the loser.

The same function for loser_total_points but for the loser.

Note that the winner and loser columns contain the respective player ids.

Now, it's fairly easy using the ave() function but that only works for grouping only column and doing the cumulative sum for one column.

Desired output:

winner loser winner_points loser_points winner_total loser_total
1      2     5             5            5            5
2      3     4             2            9 (5+4)      2
3      1     12            2            14 (2+12)    7 (5+2)
4      3     2             6            2            20 (2+12+6)
1      3     1             6            8 (5+2+1)    26 (2+12+6+6)
2      1     6             2            15 (5+4+6)   10 (5+2+1+2)

回答1:


I also am having trouble understanding but maybe this...?

library(dplyr)

as.winner <- test_data %>% group_by(winner) %>% summarise(winner_sum = sum(winner_points))
as.loser <- test_data %>% group_by(loser) %>% summarise(loser_sum = sum(loser_points))
names(as.winner)[1] <- 'player'
names(as.loser)[1] <- 'player'
totals <- merge(as.winner, as.loser, by = 'player', all.x = T, all.y = T)
totals[is.na(totals)] <- 0
totals <- transform(totals, total_points = winner_sum + loser_sum)
totals



回答2:


If I have correctly understood OP's requirements, he wants to compute a cumulative sum of the points by player id irrespective if winner_points or loser_points. The important point here is to note that the winner and loser columns contain the respective player ids.

The solution below reshapes the data from wide to long format whereby two value variables are reshaped simultaneously, computes the cumulative sums for each player id, and finally reshapes from long back to wide format, again.

library(data.table
cols <- c("winner", "loser")
setDT(test_data)[
  # append row id column required for subsequent reshaping
  , rn := .I][
    # reshape multiple value variables simultaneously from wide to long format
    , melt(.SD, id.vars = "rn", 
           measure.vars = list(cols, paste0(cols, "_points")), 
           value.name = c("id", "points"))][
             # rename variable column
             , variable := forcats::lvls_revalue(variable, cols)][
               # order by row id and compute cumulative points by id
               order(rn), total := cumsum(points), by = id][
                 # reshape multiple value variables simultaneously from long to wide format
                 , dcast(.SD, rn ~ variable, value.var = c("id", "points", "total"))]
   rn id_winner id_loser points_winner points_loser total_winner total_loser
1:  1         1        2             5            5            5           5
2:  2         2        3             4            2            9           2
3:  3         3        1            12            2           14           7
4:  4         4        3             2            6            2          20
5:  5         1        3             1            6            8          26
6:  6         2        1             6            2           15          10

Edit: Above result is in line with the expected result posted by the OP. It includes the scored points including the actual match. Meanwhile, the OP has posted a similar question where the expected result excludes the actual match.



来源:https://stackoverflow.com/questions/44591583/group-together-two-columns-with-id-do-the-cumulative-for-two-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!