Use a weights argument in a list of lm lapply calls [duplicate]

一曲冷凌霜 提交于 2019-12-08 16:43:40

问题


Here is my problem (fictional data in order to be reproducible) :

set.seed(42)
df<-data.frame("x"=rnorm(1000),"y"=rnorm(1000),"z"=rnorm(1000))
df2<-data.frame("x"=rnorm(100),"y"=rnorm(100),"z"=rnorm(100))
breaks<-c(-1000,-0.68,-0.01315,0.664,1000)
divider<-cut(df$x,breaks)
divider2<-cut(df2$x,breaks)
subDF<-by(df,INDICES=divider,data.frame)
subDF2<-by(df2,INDICES=divider2,data.frame)
reg<-lapply(subDF,lm,formula=x~.)
pre<-lapply(1:4,function(x){predict(reg[[x]],subDF2[[x]])})
lapply(1:4,function(x){summary(reg[[x]])$r.squared})

The above code works fine. What I am doing is the following : according to the values of x, I split dfin 4 dataframes and run a regression on each of those dataframes, in order to be able to predict values for an other dataset. The split of the dataframe is to allow a better prediction as the range of x has a great impact for the actual data.

What I am trying to do is to add a weights argument for the regression to give greater importance to the most recent data. My weights argument is : weights<-0.999^seq(250,1,by=-1)if there are 250 data. With a seed of 42 and the previous breaks, all of the 4 dimensions are 250.

When I try to do reg<-lapply(subDF,lm,formula=x~.,weights=0.999^seq(250,1,by=-1)), I got this error :

Error in eval(expr, envir, enclos) : 
  ..2 used in an incorrect context, no ... to look in

Which is quite strange as lapplyhas a ...argument, used here for the formula but it doesn't accept the weights.

So I really don't know what to do to add those weights. What should I correct in my code or should I (almost) entirely change it to be able to use the weights ?

For the example and in order to make it (perhaps) easier, I cut the breaks so that the 4 subsets have the same dimension but ideally the answer would work even if the 4 subsets are not of the same dimension (so with breaks of breaks<-c(-1000,-0.75,0,0.75,1000) for instance)

This post on CrossValidated has quite the same problem, but without a working solution so that didn't help me.


回答1:


I don't know why you got the error you got (I thought the ....-argument was made for that. However, I found a slight workaround, is this in the direction of what you need? What I have done is created an 'anonymous' function inside lapply, which calculates the weights (dependent on dimension of data) and returns a model.

reg2 <- lapply(subDF, function(chunk){
  #calculate weights (!dependent on data ordering)
  weights <- 0.999^seq(nrow(chunk),1,by=-1)

  #fit model
  fit <- lm(x~., data=chunk, weights=weights)
  return(fit)
})



回答2:


Unfortunately, you have experienced first hand the, arguably, nastiest error in R. The so-called Non-standard Evaluation (NSE) error.

After a bit of digging in the code I think I have found the culprit. Let's take things one by one:

First of all let's have a look at the traceback():

weights <- 0.999^seq(250,1,by=-1)

lapply(subDF, lm, formula=x~., weights=weights)
Error in eval(expr, envir, enclos) : 
  ..2 used in an incorrect context, no ... to look in
> traceback()
8: eval(expr, envir, enclos)
7: eval(extras, data, env)
6: model.frame.default(formula = ..1, data = X[[1L]], weights = ..2, 
       drop.unused.levels = TRUE)
5: stats::model.frame(formula = ..1, data = X[[1L]], weights = ..2, 
       drop.unused.levels = TRUE)
4: eval(expr, envir, enclos)
3: eval(mf, parent.frame())
2: FUN(X[[1L]], ...)
1: lapply(subDF, lm, formula = x ~ ., weights = weights)

It looks like the problem occurs inside the model.frame.default. So, let's have a look in the source code. I will not post the whole source code but if you type model.frame.default in the console, you will see somewhere in the middle:

extras <- substitute(list(...))
extranames <- names(extras[-1L])
extras <- eval(extras, data, env)

The last line is where it fails. The first line is what is called NSE and is created by substitute. substitute will create what is called an expression i.e. let's say something like an object to be evaluated (i.e. created) later inside of eval. As you can see in eval, extras will be evaluated in data and then if not found in env. For the formula it is ok because it is evaluated in the data and x~. will tell eval to use all the columns in data. weights though is not in the data. Therefore, eval will look for it in env. But what is env?

Apparently, env is an environment and is created within model.frame.default in the line:

env <- environment(formula$terms)

So, what does this mean? Let's see another example:

xtest <- function(x) {
  new_func <- function(x) {
    env <- environment(x)
    print(env)
  }
  new_func(x)
} 

> xtest(x~z)
<environment: R_GlobalEnv>

In the function above I try to replicate in fewer lines what env will be in model.frame.default. As you can see, environment(formula) points to the global environment.

So, it is there where env tries to find ..2 i.e. the second argument passed in ... (i.e. weights), but as there is no ... in the global environment, you got an error. Hope it is clear now!

Best solution and what I would do is use @Heroka 's answer to solve it (or you could rewrite the whole model.frame.default and lm from scratch without using NSE but I think the first is more reasonable :) ).



来源:https://stackoverflow.com/questions/33479862/use-a-weights-argument-in-a-list-of-lm-lapply-calls

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!