Group together two columns with ID, do the cumulative for two columns

问题

Edit: I wrote the question way to unstructured, let me try again.

I want to create two new columns, winner_total_points and loser_total_points to the dataset below.

winner <- c(1,2,3,4,1,2)
loser <- c(2,3,1,3,3,1)
winner_points <- c(5,4,12,2,1,6)
loser_points <- c(5,2,2,6,6,2)
test_data <- data.frame(winner, loser, winner_points, loser_points)

What I want those two columns to do is that winner_total_points to sum all the points the winner has gotten (excluding this match) as both the winner and the loser.

The same function for loser_total_points but for the loser.

Note that the winner and loser columns contain the respective player ids.

Now, it's fairly easy using the ave() function but that only works for grouping only column and doing the cumulative sum for one column.

Desired output:

winner loser winner_points loser_points winner_total loser_total
1      2     5             5            5            5
2      3     4             2            9 (5+4)      2
3      1     12            2            14 (2+12)    7 (5+2)
4      3     2             6            2            20 (2+12+6)
1      3     1             6            8 (5+2+1)    26 (2+12+6+6)
2      1     6             2            15 (5+4+6)   10 (5+2+1+2)

回答1:

I also am having trouble understanding but maybe this...?

library(dplyr)

as.winner <- test_data %>% group_by(winner) %>% summarise(winner_sum = sum(winner_points))
as.loser <- test_data %>% group_by(loser) %>% summarise(loser_sum = sum(loser_points))
names(as.winner)[1] <- 'player'
names(as.loser)[1] <- 'player'
totals <- merge(as.winner, as.loser, by = 'player', all.x = T, all.y = T)
totals[is.na(totals)] <- 0
totals <- transform(totals, total_points = winner_sum + loser_sum)
totals

回答2:

If I have correctly understood OP's requirements, he wants to compute a cumulative sum of the points by player id irrespective if winner_points or loser_points. The important point here is to note that the winner and loser columns contain the respective player ids.

The solution below reshapes the data from wide to long format whereby two value variables are reshaped simultaneously, computes the cumulative sums for each player id, and finally reshapes from long back to wide format, again.

library(data.table
cols <- c("winner", "loser")
setDT(test_data)[
  # append row id column required for subsequent reshaping
  , rn := .I][
    # reshape multiple value variables simultaneously from wide to long format
    , melt(.SD, id.vars = "rn", 
           measure.vars = list(cols, paste0(cols, "_points")), 
           value.name = c("id", "points"))][
             # rename variable column
             , variable := forcats::lvls_revalue(variable, cols)][
               # order by row id and compute cumulative points by id
               order(rn), total := cumsum(points), by = id][
                 # reshape multiple value variables simultaneously from long to wide format
                 , dcast(.SD, rn ~ variable, value.var = c("id", "points", "total"))]

   rn id_winner id_loser points_winner points_loser total_winner total_loser
1:  1         1        2             5            5            5           5
2:  2         2        3             4            2            9           2
3:  3         3        1            12            2           14           7
4:  4         4        3             2            6            2          20
5:  5         1        3             1            6            8          26
6:  6         2        1             6            2           15          10

Edit: Above result is in line with the expected result posted by the OP. It includes the scored points including the actual match. Meanwhile, the OP has posted a similar question where the expected result excludes the actual match.

来源：https://stackoverflow.com/questions/44591583/group-together-two-columns-with-id-do-the-cumulative-for-two-columns

标签

cumsum