How to extract fitted splines from a GAM (`mgcv::gam`)

前端 未结 1 1682
礼貌的吻别
礼貌的吻别 2020-12-02 10:33

I am using GAM to model time trends in a logistic regression. Yet I would like to extract the the fitted spline from it to add it to another model, that cannot be fitted in

相关标签:
1条回答
  • 2020-12-02 11:14

    In mgcv::gam there is a way to do this (your Q2), via the predict.gam method and type = "lpmatrix".

    ?predict.gam even has an example, which I reproduce below:

     library(mgcv)
     n <- 200
     sig <- 2
     dat <- gamSim(1,n=n,scale=sig)
    
     b <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3), data = dat)
    
     newd <- data.frame(x0=(0:30)/30, x1=(0:30)/30, x2=(0:30)/30, x3=(0:30)/30)
    
     Xp <- predict(b, newd, type="lpmatrix")
    
     ##################################################################
     ## The following shows how to use use an "lpmatrix" as a lookup 
     ## table for approximate prediction. The idea is to create 
     ## approximate prediction matrix rows by appropriate linear 
     ## interpolation of an existing prediction matrix. The additivity 
     ## of a GAM makes this possible. 
     ## There is no reason to ever do this in R, but the following 
     ## code provides a useful template for predicting from a fitted 
     ## gam *outside* R: all that is needed is the coefficient vector 
     ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
     ## higher order interpolation for higher accuracy.  
     ###################################################################
    
     xn <- c(.341,.122,.476,.981) ## want prediction at these values
     x0 <- 1         ## intercept column
     dx <- 1/30      ## covariate spacing in `newd'
     for (j in 0:2) { ## loop through smooth terms
       cols <- 1+j*9 +1:9      ## relevant cols of Xp
       i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
       w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
       ## find approx. predict matrix row portion, by interpolation
       x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
     }
     dim(x0)<-c(1,28) 
     fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
     se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
     ## compare to normal prediction
     predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
             x2=xn[3],x3=xn[4]),se=TRUE)
    

    That goes through the entire process even the prediction step which would be done outside R or of the GAM model. You are going to have to modify the example a bit to do what you want as the example evaluates all terms in the model and you have two other terms besides the spline - essentially you do the same thing, but only for the spline terms, which involves finding the relevant columns and rows of the Xp matrix for the spline. Then also you should note that the spline is centred so you may or may not want to undo that too.

    For your Q1, choose appropriate values for the xn vector/matrix in the example. These correspond to values for the nth term in the model. So set the ones you want fixed to some mean value and then vary the one associated with the spline.

    If you are doing all of this in R, it would be easier to just evaluate the spline at the values of the spline covariate that you have data for that is going into the other model. You do that by creating a data frame of values at which to predict at, then use

    predict(mod, newdata = newdat, type = "terms")
    

    where mod is the fitted GAM model (via mgcv::gam), newdat is the data frame containing a column for each variable in the model (including the parametric terms; set the terms you don't want to vary to some constant mean value [say the average of the variable in the data set] or certain level if a factor). The type = "terms" part will return a matrix for each row in newdat with the "contribution" to the fitted value for each term in the model, including the spline term. Just take the column of this matrix that corresponds to the spline - again it is centered.

    Perhaps I misunderstood your Q1. If you want to control the knots, see the knots argument to mgcv::gam. By default, mgcv::gam places a knot at the extremes of the data and then the remaining "knots" are spread evenly over the interval. mgcv::gam doesn't find the knots - it places them for you and you can control where it places them via the knots argument.

    0 讨论(0)
提交回复
热议问题