Why do quosures work in group_by() but not filter()?

我只是一个虾纸丫 提交于 2019-12-23 11:58:18

问题


I'm working on building a function that I will manipulate a data frame based on a string. Within the function, I'll build a column name as from the string and use it to manipulate the data frame, something like this:

library(dplyr)

orig_df  <- data_frame(
     id = 1:3
   , amt = c(100, 200, 300)
   , anyA = c(T,F,T)
   , othercol = c(F,F,T)
)


summarize_my_df_broken <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    filter(!!my_column) %>% 
    group_by(othercol) %>% 
    summarize(
        n = n()
      , total = sum(amt)
    ) %>%
    # I need the original string as new column which is why I can't
    # pass in just the column name
    mutate(stringid = my_string)


}


summarize_my_df_works <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    group_by(!!my_column, othercol) %>% 
    summarize(
        n = n()
      , total = sum(amt)
    )  %>%
    mutate(stringid = my_string)

}

# throws an error: 
# Argument 2 filter condition does not evaluate to a logical vector
summarize_my_df_broken(orig_df, "A")

# works just fine
summarize_my_df_works(orig_df, "A")

I understand what the problem is: unquoting the quosure as an argument to filter() in the broken version is not referencing the actual column anyA.

What I don't understand is why it works in summarize(), but not in filter()--why is there a difference?


回答1:


Right now you are are making quosures of strings, not symbol names. That's not how those are supposed to be used. There's a big difference between quo("hello") and quo(hello). If you want to make a proper symbol name from a string, you need to use rlang::sym. So a quick fix would be

summarize_my_df_broken <- function(df, my_string) {

  my_column <- rlang::sym(paste0("any", my_string))
  ...
}

If you look more closely I think you'll see the group_by/summarize isn't actually working the way you expect either (though you just don't get the same error message). These two do not produce the same results

summarize_my_df_works(orig_df, "A")
#  `paste0("any", my_string)` othercol     n total
#                        <chr>    <lgl> <int> <dbl>
# 1                       anyA    FALSE     2   300
# 2                       anyA     TRUE     1   300

orig_df  %>% 
  group_by(anyA, othercol) %>% 
  summarize(
    n = n()
    , total = sum(amt)
  )  %>%
  mutate(stringid = "A")
#    anyA othercol     n total stringid
#   <lgl>    <lgl> <int> <dbl>    <chr>
# 1 FALSE    FALSE     1   200        A
# 2  TRUE    FALSE     1   100        A
# 3  TRUE     TRUE     1   300        A

Again the problem is using a string instead of a symbol.




回答2:


You don't have any conditions for filter() in your 'broken' function, you just specify the column name.

Beyond that, I'm not sure if you can insert quosures into larger expressions. For example, here you might try something like:

df %>% filter((!!my_column) == TRUE)

But I don't think that would work.

Instead, I would suggest using the conditional function filter_at() to target the appropriate column. In that case, you separate the quosure from the filter condition:

summarize_my_df_broken <- function(df, my_string) {

  my_column <- quo(paste0("any", my_string))

  df %>% 
    filter_at(vars(!!my_column), all_vars(. == TRUE)) %>% 
    group_by(othercol) %>% 
    summarize(
      n = n()
      , total = sum(amt)
    ) %>%
mutate(stringid = my_string)

}



来源:https://stackoverflow.com/questions/46713002/why-do-quosures-work-in-group-by-but-not-filter

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!