Understanding lm and environment

问题

I'm executing lm() with arguments formula, data, na.action, and weights. My weights are stored in a numeric variable.

When I specify formula as a character (i.e. formula = "Response~0+."), I get an error that weights is not of the proper length (even though it is).
When I specify formula without the quotes (i.e. formula = Response~0+.), the function works fine.

I stumbled upon this sentence in the lm() documentation:

All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.

This is difficult for me to interpret, but I sense that it contains the answer to my question.

回答1:

(This has nothing to do with the real problem you have, [@DWin has addressed that, as have commentators on your Q] but is by way of explanation of the part of the documentation you quote)

The quoted help information means that the same process is used to find the variables/objects references in a model formula as is used to find variables/objects supplied to the arguments weights, subset etc.

R looks for for the objects referenced in the formula and by arguments weights, subset, and offset, first in the data object and then in the environment of the formula (which is usually the global environment during interactive use).

The reason why the docs mention this explicitly is because lm() as with many R functions that employ model-formula interfaces use the so-called standard non-standard evaluation. The up-shot is that say one supplies weights = foo, R won't necessarily look for object foo in evaluating the argument. Instead, it will look for an object with the name foo in the object supplied to the data argument, and if it doesn't find it there, then in the environment attached to the model formula, which as mentioned, doesn't always have to be the global environment.

回答2:

When you construct an argument that is intended to be a formula, the parser "tries it out". It "expects" the argument to be a language call in the R sense. It does not expect it to be a character string delimited by quotes. That is why you will see people constructing formula arguments with paste(.) but then finishing them off by putting the strings or more correctly "character object" as an argument to as.formula(). What gets returned has been given a class of "formula" and a mode of "call":

> class( as.formula("Y ~ x") )
[1] "formula"
> mode( as.formula("Y ~ x") )
[1] "call"s
> class( "Y ~ x")
[1] "character"
> mode( "Y ~ x")
[1] "character"

来源：https://stackoverflow.com/questions/6877534/understanding-lm-and-environment

标签