Passing column name as parameter to a function using dplyr

天大地大妈咪最大 提交于 2021-02-08 02:12:06

问题


I have a dataframe like below :

transid<-c(1,2,3,4,5,6,7,8)
accountid<-c(a,a,b,a,b,b,a,b)
month<-c(1,1,1,2,2,3,3,3)
amount<-c(10,20,30,40,50,60,70,80)
transactions<-data.frame(transid,accountid,month,amount)

I am trying to write function for total monthly amount for each accountid using dplyr package verbs.

my_sum<-function(df,col1,col2,col3){
df %>% group_by_(col1,col2) %>%summarise_(total_sum = sum(col3))
}

my_sum(transactions, "accountid","month","amount")

To get the result like below:

accountid   month  total_sum
a            1       30
a            2       40
a            3       70
b            1       30
b            2       50
b            3       140

I am getting error like:- Error in sum(col3) : invalid 'type' (character) of argument.How to pass column name as parameter without quote in summarise function?


回答1:


I would suggest the following solution:

my_sum <- function(df, col_to_sum,...) {

    col_to_sum <- enquo(col_to_sum)
    group_by <- quos(...)

    df %>%
        group_by(!!!group_by) %>%
        summarise(total_sum = sum(!!col_to_sum)) %>% 
        ungroup()
}

transactions %>% my_sum(amount, accountid, month)

Results

>> transactions %>% my_sum(amount, accountid, month)
# A tibble: 6 x 3
  accountid month total_sum
     <fctr> <dbl>     <dbl>
1         a     1        30
2         a     2        40
3         a     3        70
4         b     1        30
5         b     2        50
6         b     3       140

Data

In you original answer you have passed unqoted strings, I've solved that using Hmisc:Cs function but, on principle, you should surround your strings with ""; unless, of course, you are calling some objects named a, b and so forth. It wasn't clear from the original question.

Used data:

transid <- c(1, 2, 3, 4, 5, 6, 7, 8)
accountid <- Hmisc::Cs(a, a, b, a, b, b, a, b)
month <- c(1, 1, 1, 2, 2, 3, 3, 3)
amount <- c(10, 20, 30, 40, 50, 60, 70, 80)
transactions <- data.frame(transid, accountid, month, amount)

Notes

  • If you look at the Capturing multiple variables section of the Programming with dplyr article you will see that very similar problem is solved with use of quos() function. In effect, your task is a perfect example how the quos() function should be used.

  • The ellipsis ... should then come at the end as the assumption is that the function will be used to group data with multiple column. Naturally, if desired you you could pass columns one bye one enquo() every single column and so forth but using ... is more natural and consistent with the recommended solution discussed in the article linked above. Please note that this approach changes the order of arguments in your function call as ... should come at the end.

  • If you are using summarise() you don't have to ungroup() your data as in my example. For instance the code:

    mtcars %>% group_by(am) %>% summarise(mean_disp = mean(disp)) %>% mutate(am = am + 1) 
    

    will work; whereas the code:

    mtcars %>% group_by(am)  %>% mutate(am = am + 1)
    

    will return the expected error:

    Error in mutate_impl(.data, dots) : Column am can't be modified because it's a grouping variable

    You should use ungroup() if you are going to mutate() your original data or do other operations that keep your grouping variable intact. passing grouped variable may later prove problematic, it would say it's mostly a matter of taste/order in your dplyr workflow. If you and other function users are going to remember that the tibble may be carrying grouping variable then there is no issue; personally, I tend to forget about that so my preference is to ungroup() the data if I'm not interested in carrying grouping variable.




回答2:


You can pass quosure objects as arguments using quo() and then evaluate them lazily using some kind of unquote function, in this example i use !!

library(tidyverse)
my_sum<-function(df,col1,col2,col3){
df %>% group_by(!!col1,!!col2) %>%summarise(total_sum = sum(!!col3))
}

my_sum(transactions, quo(accountid),quo(month),quo(amount))


来源:https://stackoverflow.com/questions/47494975/passing-column-name-as-parameter-to-a-function-using-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!