问题
I'm doing some programming using dplyr
, and am curious about how to pass an expression as (specifically a MoreArgs
) argument to mapply
?
Consider a simple function F
that subsets a data.frame
based on some ids
and a time_range
, then outputs a summary statistic based on some other column x
.
require(dplyr)
F <- function(ids, time_range, df, date_column, x) {
date_column <- enquo(date_column)
x <- enquo(x)
df %>%
filter(person_id %chin% ids) %>%
filter(time_range[1] <= (!!date_column) & (!!date_column) <= time_range[2]) %>%
summarise(newvar = sum(!!x))
}
We can make up some example data to which we can apply our function F
.
person_ids <- lapply(1:2, function(i) sample(letters, size = 10))
time_ranges <- lapply(list(c("2014-01-01", "2014-12-31"),
c("2015-01-01", "2015-12-31")), as.Date)
require(data.table)
dt <- CJ(person_id = letters,
date_col = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2015-12-31'), by = '1 day'))
dt[, z := rnorm(nrow(dt))] # The variable we will later sum over, i.e. apply F to.
We can successfully apply our function to each of our inputs.
F(person_ids[[1]], time_ranges[[1]], dt, date_col, z)
F(person_ids[[2]], time_ranges[[2]], dt, date_col, z)
And so if I wanted, I could write a simple for-loop to solve my problem. But if we try to apply syntactic sugar and wrap everything within mapply
, we get an error.
mapply(F, ids = person_ids, time_range = time_ranges, MoreArgs = list(df = dt, date_column = date_col, x = z))
# Error in mapply... object 'date_col' not found
回答1:
In mapply
, MoreArgs
is provided as a list, but R tries to evaluate the list elements, causing the error. As suggested by @Gregor, you can quote
those MoreArgs
that we don't want to evaluate immediately, preventing the error and allowing the function to proceed. This can be done with base quote
or dplyr
quo
:
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quote(date_col), quote(z)))
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quo(date_col), quo(z)))
Another option is to use map2
from the purrr
package, which is the tidyverse
equivalent of mapply
with two input vectors. tidyverse
functions are set up to work with non-standard evaluation, which avoids the error you're getting with mapply
without the need for quoting the arguments:
library(purrr)
map2(person_ids, time_ranges, F, dt, date_col, z)
[[1]] newvar 1 40.23419 [[2]] newvar 1 71.42327
More generally, you could use pmap
, which iterates in parallel over any number of input vectors:
pmap(list(person_ids, time_ranges), F, dt, date_col, z)
来源:https://stackoverflow.com/questions/47022818/passing-an-expression-into-moreargs-of-mapply