Why does `fastLm()` return results when I run a regression with one observation?

前端 未结 1 994
深忆病人
深忆病人 2021-01-25 04:55

Why does fastLm() return results when I run regressions with one observation?

In the following, why aren\'t the lm() and fastLm()

相关标签:
1条回答
  • 2021-01-25 05:35

    Because fastLm doesn't worry about rank-deficiency; this is part of the price you pay for speed.

    From ?fastLm:

    ... The reason that Armadillo can do something like lm.fit faster than the functions in the stats package is because Armadillo uses the Lapack version of the QR decomposition while the stats package uses a modified Linpack version. Hence Armadillo uses level-3 BLAS code whereas the stats package uses level-1 BLAS. However, Armadillo will either fail or, worse, produce completely incorrect answers on rank-deficient model matrices whereas the functions from the stats package will handle them properly due to the modified Linpack code.

    Looking at the code here, the guts of the code are

     arma::colvec coef = arma::solve(X, y);
    

    This does a QR decomposition. We can match the lmFast results with qr() from base R (here I am not using only base R constructs rather than relying on data.table):

    set.seed(1)
    dd <- data.frame(y = rnorm(5), 
          x1 = rnorm(5), x2 = rnorm(5), my.key = 1:5)
    
    X <- model.matrix(~1+x1+x2, data=subset(dd,my.key==1))
    qr(X,dd$y)
    ## $qr
    ##   (Intercept)         x1       x2
    ## 1           1 -0.8204684 1.511781
    

    You can look at the code of lm.fit() to see what R does about rank deficiency when fitting linear models; the underlying BLAS algorithm it calls does QR with pivoting ...

    If you want to flag these situations, I think that Matrix::rankMatrix() will do the trick:

    library(Matrix)
    rankMatrix(X) < ncol(X)  ## TRUE
    X1 <- model.matrix(~1+x1+x2, data=dd)
    rankMatrix(X1) < ncol(X1)  ## FALSE
    
    0 讨论(0)
提交回复
热议问题