问题
I would like to assign the max value of a group to all rows within that group. How do I do that?
I have a dataframe containing the names of the group and the max number of credits that belongs to it.
course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)
which gives
Course Credits
1 ABC1000 6.5
2 ABC1003 6.5
3 ABC1004 6.5
4 ABC1007 5.0
5 ABC1010 6.5
6 ABC1021 6.5
7 ABC1023 6.5
The main dataframe looks like this:
Appraisal.Type Resits Credits Course_code Student_ID
Final result 0 6.5 ABC1000 10
Final result 0 6.5 ABC1003 10
Grade supervisor 0 0 ABC1000 10
Grade supervisor 0 0 ABC1003 10
Final result 0 12 ABC1294 23
Grade supervisor 0 0 ABC1294 23
As you see, student 10 took course ABC1000, worth 6.5 credits. For each course (per student), however, two rows exist: Final result and Grade supervisor. In the end, Final result should be deleted, but the credits should be kept. Therefore, I want to assign the max value of 6.5 to the Grade supervisor row. Likewise, student 23 has followed course ABC1294, worth 12 credits.
In the end, this should be the result:
Appraisal.Type Resits Credits Course_code Student_ID
Grade supervisor 0 6.5 ABC1000 10
Grade supervisor 0 6.5 ABC1003 10
Grade supervisor 0 12 ABC1294 23
How do I go about this?
回答1:
An option would be to group by 'Student_ID', mutate
the 'Credits' with max
of 'Credits' and filter
the rows with 'Appraisal.Type' as "Grade supervisor"
library(dplyr)
df1 %>%
group_by(Student_ID) %>%
dplyr::mutate(Credits = max(Credits)) %>%
ungroup %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
If we also need 'Course_code' to be included in the grouping
df2 %>%
group_by(Student_ID, Course_code) %>%
dplyr::mutate(Credits = max(Credits)) %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups: Student_ID, Course_code [3]
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
#3 Grade supervisor 0 12 ABC1294 23
NOTE: I case, plyr
package is also loaded, there can be some masking of functions esp summarise/mutate
which is also found in plyr
. To prevent it, either do this on a fresh session without loading plyr
or explicitly specify dplyr::mutate
data
df1 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L,
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000",
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L,
10L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5,
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000",
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L,
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))
回答2:
Generate a sample dataset.
data <- as.data.frame(list(Appraisal.Type = c(rep("Final result", 2), rep("Grade supervisor", 2)),
Resits = rep(0, 4),
Credits = c(rep(6.5, 2), rep(0, 2)),
Course_code = rep(c("ABC1000", "ABC1003"), 2),
Student_ID = rep(10, 4)))
Assign the max value of a group to all rows in this group and then delete rows that contain "Final results".
##Reassign the values of "Credits" column
for (i in 1: nlevels(as.factor(data$Course_code))) {
Course_code <- unique(data$Course_code)[i]
data$Credits [data$Course_code == Course_code] <- max (data$Credits [data$Course_code == Course_code])
}
##New dataset without "Final result" rows
data <- data[data$Appraisal.Type != "Final result",]
Here is the result.
data
Appraisal.Type Resits Credits Course_code Student_ID
3 Grade supervisor 0 6.5 ABC1000 10
4 Grade supervisor 0 6.5 ABC1003 10
回答3:
Here's a data.table
solution,
DT[,Credits := max(Credits),by=Student_ID]
Result <- DT[Appraisal.Type == "Grade supervisor"]
来源:https://stackoverflow.com/questions/56315103/assign-max-value-of-group-to-all-rows-in-that-group