pass family= to step() via glm() programmatically

让人想犯罪 __ 提交于 2020-03-22 06:44:32

问题


I am trying to demonstrate via simulation the performance of different models and feature selection techniques, so I wish to pass various arguments to glm() programmatically.

Under ?glm we read (italics mine):

family: a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

The problem is that when I then call step() on the resulting model, there seems to be a scoping problem and the family= parameter is no longer recognized.

Here is a minimal example:

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

  model_fam <- match.arg(family, c("gaussian", "binomial"))

  fit_null <- glm(update(formula,".~1"), 
                   family = model_fam, 
                   data = data)

  message("So far so good")

  fit_stepBIC <- step(fit_null, 
                      formula, 
                      direction="forward",
                      k = log(nrow(data)),
                      trace=0)

  message("Doesn't make it this far")

  fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)

Error message with traceback:

> getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)
So far so good

 Error in stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam,  : 
  object 'model_fam' not found 
9 stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam, 
    data = data, method = "model.frame") 
8 eval(expr, envir, enclos) 
7 eval(fcall, env) 
6 model.frame.glm(fob, xlev = object$xlevels) 
5 model.frame(fob, xlev = object$xlevels) 
4 add1.glm(fit, scope$add, scale = scale, trace = trace, k = k, 
    ...) 
3 add1(fit, scope$add, scale = scale, trace = trace, k = k, ...) 
2 step(fit_null, formula, direction = "forward", k = log(nrow(data)), 
    trace = 0) 
1 getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", 
    data = iris) 

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4       

What is the most natural way to pass this parameter so that is recognized by step? One possible workaround I'm aware of would be to call glm() with the explicit family name via if-then-else conditioned on model_fam.


回答1:


I think the following solution, based on eval, bquote and .() might solve your problem.

I also have R-version 3.2.4 installed, and I got the exact same error you got from your code. The solution below made it work at my computer.

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

    model_fam <- match.arg(family, c("gaussian", "binomial"))

    fit_null <- eval(bquote(
        glm(update(.(formula),".~1"), 
            family = .(model_fam), 
            data = .(data))))

    message("So far so good")

    fit_stepBIC <- step(fit_null, 
                        formula, 
                        direction="forward",
                        k = log(nrow(data)),
                        trace=0)

    message("Doesn't make it this far")

    fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
 getCoef(formula = Petal.Length ~ Petal.Width + Species,
        family = "gaussian",
        data = iris)

So far so good
Doesn't make it this far
      (Intercept) Speciesversicolor  Speciesvirginica       Petal.Width 
         1.211397          1.697791          2.276693          1.018712   



回答2:


The problem is that step eventually calls model.frame and model.frame evaluates the terms object in a special environment, namely the environment in which the formula was defined. That will normally be the environment from which getCoef is called. But in this environment model_fam doesn't exist because it is defined inside getCoef. One way to fix it is adding

environment(formula) <- environment()

after

model_fam <- match.arg(family, c("gaussian", "binomial"))

or something to that effect.



来源:https://stackoverflow.com/questions/36750217/pass-family-to-step-via-glm-programmatically

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!