问题
I have a dataframe like below :
transid<-c(1,2,3,4,5,6,7,8)
accountid<-c(a,a,b,a,b,b,a,b)
month<-c(1,1,1,2,2,3,3,3)
amount<-c(10,20,30,40,50,60,70,80)
transactions<-data.frame(transid,accountid,month,amount)
I am trying to write function for total monthly amount for each accountid using dplyr package verbs.
my_sum<-function(df,col1,col2,col3){
df %>% group_by_(col1,col2) %>%summarise_(total_sum = sum(col3))
}
my_sum(transactions, "accountid","month","amount")
To get the result like below:
accountid month total_sum
a 1 30
a 2 40
a 3 70
b 1 30
b 2 50
b 3 140
I am getting error like:- Error in sum(col3) : invalid 'type' (character) of argument.How to pass column name as parameter without quote in summarise function?
回答1:
I would suggest the following solution:
my_sum <- function(df, col_to_sum,...) {
col_to_sum <- enquo(col_to_sum)
group_by <- quos(...)
df %>%
group_by(!!!group_by) %>%
summarise(total_sum = sum(!!col_to_sum)) %>%
ungroup()
}
transactions %>% my_sum(amount, accountid, month)
Results
>> transactions %>% my_sum(amount, accountid, month)
# A tibble: 6 x 3
accountid month total_sum
<fctr> <dbl> <dbl>
1 a 1 30
2 a 2 40
3 a 3 70
4 b 1 30
5 b 2 50
6 b 3 140
Data
In you original answer you have passed unqoted strings, I've solved that using Hmisc:Cs function but, on principle, you should surround your strings with ""
; unless, of course, you are calling some objects named a
, b
and so forth. It wasn't clear from the original question.
Used data:
transid <- c(1, 2, 3, 4, 5, 6, 7, 8)
accountid <- Hmisc::Cs(a, a, b, a, b, b, a, b)
month <- c(1, 1, 1, 2, 2, 3, 3, 3)
amount <- c(10, 20, 30, 40, 50, 60, 70, 80)
transactions <- data.frame(transid, accountid, month, amount)
Notes
If you look at the Capturing multiple variables section of the Programming with dplyr article you will see that very similar problem is solved with use of quos() function. In effect, your task is a perfect example how the
quos()
function should be used.The ellipsis
...
should then come at the end as the assumption is that the function will be used to group data with multiple column. Naturally, if desired you you could pass columns one bye oneenquo()
every single column and so forth but using...
is more natural and consistent with the recommended solution discussed in the article linked above. Please note that this approach changes the order of arguments in your function call as...
should come at the end.If you are using
summarise()
you don't have toungroup()
your data as in my example. For instance the code:mtcars %>% group_by(am) %>% summarise(mean_disp = mean(disp)) %>% mutate(am = am + 1)
will work; whereas the code:
mtcars %>% group_by(am) %>% mutate(am = am + 1)
will return the expected error:
Error in mutate_impl(.data, dots) : Column
am
can't be modified because it's a grouping variableYou should use
ungroup()
if you are going tomutate()
your original data or do other operations that keep your grouping variable intact. passing grouped variable may later prove problematic, it would say it's mostly a matter of taste/order in yourdplyr
workflow. If you and other function users are going to remember that the tibble may be carrying grouping variable then there is no issue; personally, I tend to forget about that so my preference is toungroup()
the data if I'm not interested in carrying grouping variable.
回答2:
You can pass quosure objects as arguments using quo()
and then evaluate them lazily using some kind of unquote function, in this example i use !!
library(tidyverse)
my_sum<-function(df,col1,col2,col3){
df %>% group_by(!!col1,!!col2) %>%summarise(total_sum = sum(!!col3))
}
my_sum(transactions, quo(accountid),quo(month),quo(amount))
来源:https://stackoverflow.com/questions/47494975/passing-column-name-as-parameter-to-a-function-using-dplyr