问题
Is it possible to refer to column names in a lambda function inside across()
?
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[COLNAME]])))
I just came across this question where OP asks about validating columns in a dataframe based on a list of allowed values.
dplyr
just gained across()
and it seems like a natural choice, but we need columns names to look up the allowed values.
The best I could come up with was a call to imap_dfr
, but it is more cumbersome to integrate into an anlysis pipeline, because the results need to be re-combined with the original dataframe.
回答1:
I think that you may be asking too much of across
at this point (but this may spur additional development, so maybe someday it will work the way you suggest).
I think that the imap
functions from the purrr package may give you what you want at this point:
> df <- tibble(age = c(12, 45), sex = c('f', 'f'))
> allowed_values <- list(age = 18:100, sex = c("f", "m"))
>
> df %>% imap( ~ .x %in% allowed_values[[.y]])
$age
[1] FALSE TRUE
$sex
[1] TRUE TRUE
> df %>% imap_dfc( ~ .x %in% allowed_values[[.y]])
# A tibble: 2 x 2
age sex
<lgl> <lgl>
1 FALSE TRUE
2 TRUE TRUE
If you want a single column with the combined validity then you can pass the result through reduce
:
> df %>% imap( ~ .x %in% allowed_values[[.y]]) %>%
+ reduce(`&`)
[1] FALSE TRUE
This could then be added as a new column to the original data, or just used for subsetting the data. I am not expert enough with the tidyverse yet to know if this could be combined with mutate
to add the columns directly.
回答2:
The answer is yes, you can refer to column names in dplyr
's across
. You need to use cur_column()
. Your original answer was so close! Insert cur_column()
into your solution where you want the column name:
library(tidyverse)
df <- tibble(age = c(12, 45), sex = c('f', 'f'))
allowed_values <- list(age = 18:100, sex = c("f", "m"))
df %>%
mutate(across(c(age, sex),
c(valid = ~ .x %in% allowed_values[[cur_column()]])
)
)
Reference: https://dplyr.tidyverse.org/articles/colwise.html#current-column
来源:https://stackoverflow.com/questions/62155957/refering-to-column-names-inside-dplyrs-across