Assign max value of group to all rows in that group

问题

I would like to assign the max value of a group to all rows within that group. How do I do that?

I have a dataframe containing the names of the group and the max number of credits that belongs to it.

course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)

which gives

    Course    Credits
1   ABC1000  6.5
2   ABC1003  6.5
3   ABC1004  6.5
4   ABC1007  5.0
5   ABC1010  6.5
6   ABC1021  6.5
7   ABC1023  6.5

The main dataframe looks like this:

Appraisal.Type   Resits   Credits Course_code   Student_ID          
Final result       0       6.5    ABC1000           10                
Final result       0       6.5    ABC1003           10               
Grade supervisor   0       0      ABC1000           10               
Grade supervisor   0       0      ABC1003           10 
Final result       0       12     ABC1294           23   
Grade supervisor   0       0      ABC1294           23

As you see, student 10 took course ABC1000, worth 6.5 credits. For each course (per student), however, two rows exist: Final result and Grade supervisor. In the end, Final result should be deleted, but the credits should be kept. Therefore, I want to assign the max value of 6.5 to the Grade supervisor row. Likewise, student 23 has followed course ABC1294, worth 12 credits.

In the end, this should be the result:

Appraisal.Type   Resits   Credits Course_code   Student_ID                      
Grade supervisor   0       6.5      ABC1000           10               
Grade supervisor   0       6.5      ABC1003           10    
Grade supervisor   0       12       ABC1294           23

How do I go about this?

回答1:

An option would be to group by 'Student_ID', mutate the 'Credits' with max of 'Credits' and filter the rows with 'Appraisal.Type' as "Grade supervisor"

library(dplyr)
df1 %>%
   group_by(Student_ID) %>%
   dplyr::mutate(Credits = max(Credits)) %>%
   ungroup %>%
   filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10

If we also need 'Course_code' to be included in the grouping

df2 %>%
  group_by(Student_ID, Course_code) %>% 
  dplyr::mutate(Credits = max(Credits)) %>%  
  filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups:   Student_ID, Course_code [3]
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10
#3 Grade supervisor      0    12   ABC1294             23

NOTE: I case, plyr package is also loaded, there can be some masking of functions esp summarise/mutate which is also found in plyr. To prevent it, either do this on a fresh session without loading plyr or explicitly specify dplyr::mutate

data

df1 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L, 
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000", 
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L, 
10L)), class = "data.frame", row.names = c(NA, -4L)) 



df2 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5, 
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000", 
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L, 
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))

回答2:

Generate a sample dataset.

data <- as.data.frame(list(Appraisal.Type = c(rep("Final result", 2), rep("Grade supervisor", 2)), 
                      Resits = rep(0, 4), 
                      Credits = c(rep(6.5, 2), rep(0, 2)), 
                      Course_code = rep(c("ABC1000", "ABC1003"), 2), 
                      Student_ID = rep(10, 4)))

Assign the max value of a group to all rows in this group and then delete rows that contain "Final results".

##Reassign the values of "Credits" column
for (i in 1: nlevels(as.factor(data$Course_code))) {
  Course_code <- unique(data$Course_code)[i]
  data$Credits [data$Course_code == Course_code] <- max (data$Credits [data$Course_code == Course_code]) 
}
##New dataset without "Final result" rows
data <- data[data$Appraisal.Type != "Final result",]

Here is the result.

data
    Appraisal.Type Resits Credits Course_code Student_ID
3 Grade supervisor      0     6.5     ABC1000         10
4 Grade supervisor      0     6.5     ABC1003         10

回答3:

Here's a data.table solution,

DT[,Credits := max(Credits),by=Student_ID]
Result <- DT[Appraisal.Type == "Grade supervisor"]

来源：https://stackoverflow.com/questions/56315103/assign-max-value-of-group-to-all-rows-in-that-group

标签

group-by

dplyr

max