Concatenate expressions to subset a dataframe

柔情痞子 提交于 2020-02-02 10:57:57

问题


I am attempting to create a function that will calculate the mean of a column in a subsetted dataframe. The trick here is that I always to want to have a couple subsetting conditions and then have the option to pass more conditions to the functions to further subset the dataframe.

Suppose my data look like this:

dat <- data.frame(var1 = rep(letters, 26), var2 = rep(letters, each = 26), var3 = runif(26^2))

head(dat)
  var1 var2      var3
1    a    a 0.7506109
2    b    a 0.7763748
3    c    a 0.6014976
4    d    a 0.6229010
5    e    a 0.5648263
6    f    a 0.5184999

I want to be able to do the subset shown below, using the first condition in all function calls, and the second be something that can change with each function call. Additionally, the second subsetting condition could be on other variables (I'm using a single variable, var2, for parsimony, but the condition could involve multiple variables).

subset(dat, var1 %in% c('a', 'b', 'c') & var2 %in% c('a', 'b'))
   var1 var2      var3
1     a    a 0.7506109
2     b    a 0.7763748
3     c    a 0.6014976
27    a    b 0.7322357
28    b    b 0.4593551
29    c    b 0.2951004

My example function and function call would look something like:

getMean <- function(expr) {  
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') eval(expr)), mean(var3)))  
}
getMean(expression(& var2 %in% c('a', 'b')))

An alternative call could look like:

getMean(expression(& var4 < 6 & var5 > 10))

Any help is much appreciated.


EDIT: With Wojciech Sobala's help, I came up with the following function, which gives me the option of passing in 0 or more conditions.

getMean <- function(expr = NULL) {
  sub <- if(is.null(expr)) { expression(var1 %in% c('a', 'b', 'c'))
  } else expression(var1 %in% c('a', 'b', 'c') & eval(expr))
  return(with(subset(dat, eval(sub)), mean(var3)))
}
getMean()
getMean(expression(var2 %in% c('a', 'b')))

回答1:


It can be simplified with defalut expr=TRUE.

getMean <- function(expr = TRUE) {
  return(with(subset(dat, var1 %in% c('a', 'b', 'c') & eval(expr)), mean(var3)))
}



回答2:


This is how I would approach it. The function getMean makes use of the R's handy default parameter settings:

getMean <- function(x, subset_var1, subset_var2=unique(x$var2)){
    xs <- subset(x, x$var1 %in% subset_var1 & x$var2 %in% subset_var2)

    mean(xs$var3)
}

getMean(dat, c('a', 'b', 'c'))
[1] 0.4762141

getMean(dat, c('a', 'b', 'c'), c('a', 'b'))
[1] 0.3814149


来源:https://stackoverflow.com/questions/5531238/concatenate-expressions-to-subset-a-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!