问题
Wanting to use dplyr and case_when
to collapse a series of indicator columns into a single column. The challenge is I want to be able to collapse over an unspecified/dynamic number of columns.
Consider the following dataset, gear
has been split into a series of indicator columns.
library(dplyr)
data(mtcars)
mtcars = mtcars %>%
mutate(g2 = ifelse(gear == 2, 1, 0),
g3 = ifelse(gear == 3, 1, 0),
g4 = ifelse(gear == 4, 1, 0)) %>%
select(g2, g3, g4)
I am trying to write a function that does the reverse.
When I know how many cases this can be done as follows:
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = case_when(!!sym(cols[1]) == 1 ~ vals[1],
!!sym(cols[2]) == 1 ~ vals[2],
!!sym(cols[3]) == 1 ~ vals[3]))
}
cols = c("g2", "g3", "g4")
vals = c(2,3,4)
combine_indices(mtcars, cols, vals)
However, I would like the combine_indices
function to handle any number of index columns (right now it works for exactly three).
According to the documentation (?case_when
), "if your patterns are stored in a list, you can splice that in with !!!
". But I can not get this working:
patterns = list(sym(cols[1] == 1 ~ vals[1],
sym(cols[2] == 1 ~ vals[2],
sym(cols[3] == 1 ~ vals[3])
mtcars %>% mutate(new_col = case_when(!!!patterns))
Only produces a new column filled with NAs.
If !!!patterns
worked, then it would be straightforward to take the lists cols
and vals
and generate patterns
. However, I can not get the quosures correct. Hoping someone more familiar with quosures knows how.
Note - some similar questions here of SO were solved using joins or other functions. However, I am restricted to using case_when
because of how it translates to sql when using dbplyr.
回答1:
We can create a string of conditions, use parse_exprs
and splice it (!!!
).
library(dplyr)
library(rlang)
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = case_when(!!!parse_exprs(paste(cols, '== 1 ~', vals))))
}
cols = c("g2", "g3", "g4")
vals = c(2,3,4)
combine_indices(mtcars, cols, vals)
which returns :
# g2 g3 g4 new_col
#1 0 0 1 4
#2 0 0 1 4
#3 0 0 1 4
#4 0 1 0 3
#5 0 1 0 3
#6 0 1 0 3
#....
where paste
generates the conditions for case_when
dynamically.
paste(cols, '== 1 ~', vals)
#[1] "g2 == 1 ~ 2" "g3 == 1 ~ 3" "g4 == 1 ~ 4"
回答2:
This solution should create a column for any value in the gear column:
data <- mtcars %>%
mutate(mygear = gear) %>%
pivot_wider(values_from = gear, names_from = gear, names_prefix = "g") %>%
mutate_at(vars(starts_with('g')), function(x) x/.$mygear) %>%
mutate_if(is.numeric , replace_na, replace = 0) %>%
rename(gear = mygear)
I do need to create a temporary column mygear
as pivot_wider
does not retain the pivot column.
> data
# A tibble: 32 x 14
mpg cyl disp hp drat wt qsec vs am carb gear g4 g3 g5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 1 0 0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 1 0 0
3 22.8 4 108 93 3.85 2.32 18.6 1 1 1 4 1 0 0
4 21.4 6 258 110 3.08 3.22 19.4 1 0 1 3 0 1 0
5 18.7 8 360 175 3.15 3.44 17.0 0 0 2 3 0 1 0
6 18.1 6 225 105 2.76 3.46 20.2 1 0 1 3 0 1 0
7 14.3 8 360 245 3.21 3.57 15.8 0 0 4 3 0 1 0
8 24.4 4 147. 62 3.69 3.19 20 1 0 2 4 1 0 0
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 2 4 1 0 0
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 1 0 0
# … with 22 more rows
回答3:
You need to make object in list unevaluated expression by expr()
to evaluate by case_when
.
To be honest I didn't understand it completely, but it is work.
patterns <- list(expr(!!sym(cols[1]) == 1 ~ vals[1]),
expr(!!sym(cols[2]) == 1 ~ vals[2]),
expr(!!sym(cols[3]) == 1 ~ vals[3]))
OR more simply
patterns <- exprs(!!sym(cols[1]) == 1 ~ vals[1],
!!sym(cols[2]) == 1 ~ vals[2],
!!sym(cols[3]) == 1 ~ vals[3])
mtcars %>% mutate(new_col = case_when(!!!patterns))
回答4:
For the sake of completeness, for this particular use case only the result can be obtained using matrix multiplication:
library(dplyr)
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = as.matrix(db[, cols]) %*% vals)
}
cols = c("g2", "g3", "g4")
vals = c(2, 3, 4)
combine_indices(mtcars, cols, vals)
g2 g3 g4 new_col 1 0 0 1 4 2 0 0 1 4 3 0 0 1 4 4 0 1 0 3 5 0 1 0 3 6 0 1 0 3 7 0 1 0 3 8 0 0 1 4 9 0 0 1 4 10 0 0 1 4 11 0 0 1 4 12 0 1 0 3 13 0 1 0 3 14 0 1 0 3 15 0 1 0 3 16 0 1 0 3 17 0 1 0 3 18 0 0 1 4 19 0 0 1 4 20 0 0 1 4 21 0 1 0 3 22 0 1 0 3 23 0 1 0 3 24 0 1 0 3 25 0 1 0 3 26 0 0 1 4 27 0 0 0 0 28 0 0 0 0 29 0 0 0 0 30 0 0 0 0 31 0 0 0 0 32 0 0 1 4
Explanation
For row 1, we get
0 * 2 + 0 * 3 + 1 * 4 = 4
回答5:
Perhaps I'm looking at it wrong, but I think this can be done more efficiently with a join:
cols <- tibble(g2 = c(1, 0, 0), g3 = c(0, 1, 0), g4 = c(0, 0, 1), val = c(2, 3, 4))
cols
# # A tibble: 3 x 4
# g2 g3 g4 val
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 0 2
# 2 0 1 0 3
# 3 0 0 1 4
# using your mtcars
left_join(mtcars, cols, by = c("g2", "g3", "g4"))
# g2 g3 g4 val
# 1 0 0 1 4
# 2 0 0 1 4
# 3 0 0 1 4
# 4 0 1 0 3
# 5 0 1 0 3
# 6 0 1 0 3
# 7 0 1 0 3
# 8 0 0 1 4
# 9 0 0 1 4
# 10 0 0 1 4
# 11 0 0 1 4
# 12 0 1 0 3
# 13 0 1 0 3
# 14 0 1 0 3
# 15 0 1 0 3
# 16 0 1 0 3
# 17 0 1 0 3
# 18 0 0 1 4
# 19 0 0 1 4
# 20 0 0 1 4
# 21 0 1 0 3
# 22 0 1 0 3
# 23 0 1 0 3
# 24 0 1 0 3
# 25 0 1 0 3
# 26 0 0 1 4
# 27 0 0 0 NA
# 28 0 0 0 NA
# 29 0 0 0 NA
# 30 0 0 0 NA
# 31 0 0 0 NA
# 32 0 0 1 4
来源:https://stackoverflow.com/questions/61789717/dplyr-case-when-with-dynamic-number-of-cases