问题
I am puzzled at a behaviour of a function that I am trying to write. My example comes from the survival
package but I think that the question is more general than that. Basically, the following code
library(survival)
data(bladder) ## this will load "bladder", "bladder1" and "bladder2"
mod_init <- coxph(Surv(start, stop, event) ~ rx + number, data = bladder2, method = "breslow")
survfit(mod_init)
Will yield an object that I am interested in. However, when I write it in a function,
my_function <- function(formula, data) {
mod_init <- coxph(formula = formula, data = data, method = "breslow")
survfit(mod_init)
}
my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
the function will return an error at the last line:
Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'closure'
10 eval(predvars, data, env)
9 model.frame.default(formula = Surv(start, stop, event) ~ rx +
number, data = data)
8 stats::model.frame(formula = Surv(start, stop, event) ~ rx +
number, data = data)
7 eval(expr, envir, enclos)
6 eval(temp, environment(formula$terms), parent.frame())
5 model.frame.coxph(object)
4 stats::model.frame(object)
3 survfit.coxph(mod_init)
2 survfit(mod_init)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
I am curious whether there is something obvious that I am missing or whether such behaviour is normal. I find it strange, since in the environment of my_function
I would have the same objects as in the global environment when running the first portion of the code.
Edit: I also received useful input from Terry Therneau, the author of the survival
package. This is his answer:
This is a problem that stems from the non-standard evaluation done by model.frame. The only way out of it that I have found is to add model.frame=TRUE to the original coxph call. I consider it a serious design flaw in R. Non-standard evaluation is like the dark side -- a tempting and easy path that always ends badly. Terry T.
回答1:
Diagnose
From the error message:
2 survfit(mod_init, newdata = base_case)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)
the problem is clearly not with coxph
during model fitting, but with survfit
.
And from this message:
10 eval(predvars, data, env)
9 model.frame.default(formula = Surv(start, stop, event) ~ rx +
number, data = data)
I can tell that the problem is that during early stage of survfit
, the function model.frame.default()
can not find a model frame containing relevant data used in formula Surv(start, stop, event) ~ rx + number
. Hence it complains.
What is a model frame?
A model frame, is formed from the data
argument passed to fitting routine, like lm()
, glm()
and mgcv:::gam()
. It is a data frame with the same number of rows as data
, but:
- dropping all variables not referenced by
formula
- adding many attributes, the most important of which is
envrionement
Most model fitting routines, like lm()
, glm()
, and mgcv:::gam()
, will keep the model frame in their fitted object by default. This has advantage that if we later call predict
, and no newdata
is provided, it will find data from this model frame for evaluation. However, a clear disadvantage is that it will substantially increase the size of your fitted object.
However, survival:::coxph()
is an exception. It will by default not retain such model frame in their fitted object. Well, clearly, this makes the resulting fitted object much smaller in size, but, expose you to the problem you have encountered. If we want to ask survival:::coxph()
to keep this model frame, then use model = TRUE
of this function.
Test with survial:::coxph()
library(survival); data(bladder)
my_function <- function(myformula, mydata, keep.mf = TRUE) {
fit <- coxph(myformula, mydata, method = "breslow", model = keep.mf)
survfit(fit)
}
Now, this function call will fail, as you have seen:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = FALSE)
but this function call will succeed:
my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = TRUE)
Same behaviour for lm()
We can actually demonstrate the same behaviour in lm()
:
## generate some toy data
foo <- data.frame(x = seq(0, 1, length = 20), y = seq(0, 1, length = 20) + rnorm(20, 0, 0.15))
## a wrapper function
bar <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit)
}
Now this will succeed, by keeping model frame:
bar(y ~ x - 1, foo, keep.mf = TRUE)
while this will fail, by discarding model frame:
bar(y ~ x - 1, foo, keep.mf = FALSE)
Using argument newdata
?
Note that my example for lm()
is slightly artificial, because we can actually use newdata
argument in predict.lm()
to get through this problem:
bar1 <- function(myformula, mydata, keep.mf = TRUE) {
fit <- lm(myformula, mydata, model = keep.mf)
predict.lm(fit, newdata = lapply(mydata, mean))
}
Now whether we keep model frame, both will succeed:
bar1(y ~ x - 1, foo, keep.mf = TRUE)
bar1(y ~ x - 1, foo, keep.mf = FALSE)
Then you may wonder: can we do the same for survfit()
?
survfit()
is a generic function, in your code, you are really calling survfit.coxph()
. There is indeed a newdata
argument for this function. The documentation reads:
newdata:
a data frame with the same variable names as those that appear in the ‘coxph’ formula. ... ... Default is the mean of the covariates used in the ‘coxph’ fit.
So, let's try:
my_function1 <- function(myformula, mydata) {
mtrace.off()
fit <- coxph(myformula, mydata, method = "breslow")
survival:::survfit.coxph(fit, newdata = lapply(mydata, mean))
}
and we hope this work:
my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
But:
Error in is.data.frame(data) (from #5) : object 'mydata' not found
1: my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
2: #5: survival:::survfit.coxph(fit, lapply(mydata, mean))
3: stats::model.frame(object)
4: model.frame.coxph(object)
5: eval(temp, environment(formula$terms), parent.frame())
6: eval(expr, envir, enclos)
7: stats::model.frame(formula = Surv(start, stop, event) ~ rx + number, data =
8: model.frame.default(formula = Surv(start, stop, event) ~ rx + number, data
9: is.data.frame(data)
Note that although we pass in newdata
, it is not used in construction of model frame:
3: stats::model.frame(object)
Only object
, a copy of fitted model, is passed to model.frame.default()
.
This is very different from what happens in predict.lm()
, predict.glm()
and mgcv:::predict.gam()
. In these routines, newdata
is passed to model.frame.default()
. For example, in lm()
, there is:
m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)
I don't use survival
package, so not sure how newdata
works in this package. So I think we really need some expert explaining this.
回答2:
I think it might be that if your
Surv(start, stop, event) ~ rx + number
is in as a parameter, it does not get properly created. Try put
is.Surv(formula)
as your first line in the function. I suspect it wont work, then I would suggest using apply family of functions.
来源:https://stackoverflow.com/questions/37364571/r-model-frame-and-non-standard-evaluation