Using clustered covariance matrix in predict.lm()

前端 未结 2 567
心在旅途
心在旅途 2021-02-09 00:04

I am analyzing a dataset in which data is clustered in several groups (towns in regions). The dataset looks like:

R> df <- data.frame(x = rnorm(10), 
             


        
相关标签:
2条回答
  • 2021-02-09 00:32

    I modified the above code slightly to be more consistent with the predict function--this way you are not expected to enter values for the outcome in the newdata dataframe

    predict.rob <- function(x,clcov,newdata){
    if(missing(newdata)){ newdata <- x$model }
    tt <- terms(x)
    Terms <- delete.response(tt)
    m.mat <- model.matrix(Terms,data=newdata)
    m.coef <- x$coef
    fit <- as.vector(m.mat %*% x$coef)
    se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
    return(list(fit=fit,se.fit=se.fit))}
    
    0 讨论(0)
  • 2021-02-09 00:49

    The se.fit in predict is not calculated using the vcov matrix, but using the qr decomposition and the residual variance. This goes for the vcov() function as well: it takes the unscaled cov matrix from the summary.lm() together with the residual variance, and uses those ones. And the unscaled cov matrix is - again- calculated from the QR decomposition.

    So I'm afraid the answer is "no, there is no other option than to write your own function". You can't really set the vcov matrix, as it is recalculated when needed. Yet, writing your own function is rather trivial.

    predict.rob <- function(x,clcov,newdata){
        if(missing(newdata)){ newdata <- x$model }
        m.mat <- model.matrix(x$terms,data=newdata)
        m.coef <- x$coef
        fit <- as.vector(m.mat %*% x$coef)
        se.fit <- sqrt(diag(m.mat%*%clcov%*%t(m.mat)))
        return(list(fit=fit,se.fit=se.fit))
    }
    

    I didn't use the predict() function to avoid unnecessary calculations. It wouldn't shorten the code too much anyway.


    On a side note, questions like this are better asked on stats.stackexchange.com

    0 讨论(0)
提交回复
热议问题