问题
The data below have an IndID
field as well as three columns containing numbers, including NA in some instances, with a varying number of rows for each IndID
.
library(dplyr)
n = 10
set.seed(123)
dat <- data.frame(IndID = sample(c("AAA", "BBB", "CCC", "DDD"), n, replace = T),
Num1 = c(2,4,2,4,4,1,3,4,3,2),
Num2 = sample(c(1,2,5,8,7,8,NA), n, replace = T),
Num3 = sample(c(NA, NA,NA,8,7,9,NA), n, replace = T)) %>%
arrange(IndID)
head(dat)
IndID Num1 Num2 Num3
1 AAA 1 NA 7
2 BBB 2 NA NA
3 BBB 2 7 7
4 BBB 2 NA NA
5 CCC 3 2 8
6 CCC 3 5 NA
For each IndID
, I would like to make a new column Max
that contains the maximum value for Num1
:Num3
. In most instances this involves finding the max value across multiple rows and columns. Within dplyr
I am missing the final step (below) and would appreciate any suggestions.
dat %>%
group_by(IndID) %>%
mutate(Max = "???")
回答1:
An option is pmax
to get the rowwise maxs
dat %>%
mutate(Max = pmax(Num1, Num2, Num3, na.rm = TRUE))
If there are many columns, we can get the column names, convert it to symbol and then evaluate (!!!
)
dat %>%
mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE))
# A tibble: 10 x 5
# Groups: IndID [4]
# IndID Num1 Num2 Num3 Max
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 AAA 1 NA 7 7
# 2 BBB 2 NA NA 2
# 3 BBB 2 7 7 7
# 4 BBB 2 NA NA 2
# 5 CCC 3 2 8 8
# 6 CCC 3 5 NA 5
# 7 DDD 4 8 7 8
# 8 DDD 4 7 NA 7
# 9 DDD 4 1 7 7
#10 DDD 4 1 7 7
If this is to get the max of all 'Num' column grouped by 'IndID', there are multiple ways.
1) From the above step, we can extend it to group by 'IndID' and then take the max
of row maxs ('Max')
dat %>%
mutate(Max = pmax(!!! rlang::syms(names(.)[-1]), na.rm = TRUE)) %>%
group_by(IndID) %>%
mutate(Max = max(Max))
2) Another option is to convert the 'wide' format to 'long' with gather
, then grouped by 'IndID', get the max
of 'val' column and right_join
with the original dataset
library(tidyverse)
gather(dat, key, val, -IndID) %>%
group_by(IndID) %>%
summarise(Max = max(val,na.rm = TRUE)) %>%
right_join(dat)
3) Or another option without reshaping into 'long' format would be to nest
the dataset after grouping by 'IndID', unlist
and get the max
of the 'Num' columns
dat %>%
group_by(IndID) %>%
nest %>%
mutate(data = map(data, ~ .x %>%
mutate(Max = max(unlist(.), na.rm = TRUE)))) %>%
unnest
来源:https://stackoverflow.com/questions/51584602/select-a-maximum-value-across-rows-and-columns-with-grouped-data