问题
Here is my problem (fictional data in order to be reproducible) :
set.seed(42)
df<-data.frame("x"=rnorm(1000),"y"=rnorm(1000),"z"=rnorm(1000))
df2<-data.frame("x"=rnorm(100),"y"=rnorm(100),"z"=rnorm(100))
breaks<-c(-1000,-0.68,-0.01315,0.664,1000)
divider<-cut(df$x,breaks)
divider2<-cut(df2$x,breaks)
subDF<-by(df,INDICES=divider,data.frame)
subDF2<-by(df2,INDICES=divider2,data.frame)
reg<-lapply(subDF,lm,formula=x~.)
pre<-lapply(1:4,function(x){predict(reg[[x]],subDF2[[x]])})
lapply(1:4,function(x){summary(reg[[x]])$r.squared})
The above code works fine. What I am doing is the following : according to the values of x
, I split df
in 4 dataframes and run a regression on each of those dataframes, in order to be able to predict values for an other dataset. The split of the dataframe is to allow a better prediction as the range of x
has a great impact for the actual data.
What I am trying to do is to add a weights argument for the regression to give greater importance to the most recent data. My weights argument is : weights<-0.999^seq(250,1,by=-1)
if there are 250 data. With a seed of 42 and the previous breaks, all of the 4 dimensions are 250.
When I try to do reg<-lapply(subDF,lm,formula=x~.,weights=0.999^seq(250,1,by=-1))
, I got this error :
Error in eval(expr, envir, enclos) :
..2 used in an incorrect context, no ... to look in
Which is quite strange as lapply
has a ...
argument, used here for the formula
but it doesn't accept the weights
.
So I really don't know what to do to add those weights. What should I correct in my code or should I (almost) entirely change it to be able to use the weights ?
For the example and in order to make it (perhaps) easier, I cut the breaks so that the 4 subsets have the same dimension but ideally the answer would work even if the 4 subsets are not of the same dimension (so with breaks of breaks<-c(-1000,-0.75,0,0.75,1000)
for instance)
This post on CrossValidated has quite the same problem, but without a working solution so that didn't help me.
回答1:
I don't know why you got the error you got (I thought the ....
-argument was made for that. However, I found a slight workaround, is this in the direction of what you need? What I have done is created an 'anonymous' function inside lapply, which calculates the weights (dependent on dimension of data) and returns a model.
reg2 <- lapply(subDF, function(chunk){
#calculate weights (!dependent on data ordering)
weights <- 0.999^seq(nrow(chunk),1,by=-1)
#fit model
fit <- lm(x~., data=chunk, weights=weights)
return(fit)
})
回答2:
Unfortunately, you have experienced first hand the, arguably, nastiest error in R. The so-called Non-standard Evaluation (NSE) error.
After a bit of digging in the code I think I have found the culprit. Let's take things one by one:
First of all let's have a look at the traceback()
:
weights <- 0.999^seq(250,1,by=-1)
lapply(subDF, lm, formula=x~., weights=weights)
Error in eval(expr, envir, enclos) :
..2 used in an incorrect context, no ... to look in
> traceback()
8: eval(expr, envir, enclos)
7: eval(extras, data, env)
6: model.frame.default(formula = ..1, data = X[[1L]], weights = ..2,
drop.unused.levels = TRUE)
5: stats::model.frame(formula = ..1, data = X[[1L]], weights = ..2,
drop.unused.levels = TRUE)
4: eval(expr, envir, enclos)
3: eval(mf, parent.frame())
2: FUN(X[[1L]], ...)
1: lapply(subDF, lm, formula = x ~ ., weights = weights)
It looks like the problem occurs inside the model.frame.default
. So, let's have a look in the source code. I will not post the whole source code but if you type model.frame.default
in the console, you will see somewhere in the middle:
extras <- substitute(list(...))
extranames <- names(extras[-1L])
extras <- eval(extras, data, env)
The last line is where it fails. The first line is what is called NSE and is created by substitute
. substitute
will create what is called an expression
i.e. let's say something like an object to be evaluated (i.e. created) later inside of eval
. As you can see in eval
, extras
will be evaluated in data
and then if not found in env
. For the formula it is ok because it is evaluated in the data and x~.
will tell eval
to use all the columns in data
. weights
though is not in the data
. Therefore, eval
will look for it in env
. But what is env
?
Apparently, env
is an environment and is created within model.frame.default
in the line:
env <- environment(formula$terms)
So, what does this mean? Let's see another example:
xtest <- function(x) {
new_func <- function(x) {
env <- environment(x)
print(env)
}
new_func(x)
}
> xtest(x~z)
<environment: R_GlobalEnv>
In the function above I try to replicate in fewer lines what env
will be in model.frame.default
. As you can see, environment(formula)
points to the global environment.
So, it is there where env
tries to find ..2
i.e. the second argument passed in ...
(i.e. weights
), but as there is no ...
in the global environment, you got an error. Hope it is clear now!
Best solution and what I would do is use @Heroka 's answer to solve it (or you could rewrite the whole model.frame.default
and lm
from scratch without using NSE but I think the first is more reasonable :) ).
来源:https://stackoverflow.com/questions/33479862/use-a-weights-argument-in-a-list-of-lm-lapply-calls