Why does `fastLm()` return results when I run a regression with one observation?

前端未结

关注

 1  994

深忆病人

Why does fastLm() return results when I run regressions with one observation?

In the following, why aren\'t the lm() and fastLm()

相关标签:

1条回答

遇见更好的自我

2021-01-25 05:35
Because fastLm doesn't worry about rank-deficiency; this is part of the price you pay for speed.

From ?fastLm:

... The reason that Armadillo can do something like lm.fit faster than the functions in the stats package is because Armadillo uses the Lapack version of the QR decomposition while the stats package uses a modified Linpack version. Hence Armadillo uses level-3 BLAS code whereas the stats package uses level-1 BLAS. However, Armadillo will either fail or, worse, produce completely incorrect answers on rank-deficient model matrices whereas the functions from the stats package will handle them properly due to the modified Linpack code.

Looking at the code here, the guts of the code are
```
 arma::colvec coef = arma::solve(X, y);
```
This does a QR decomposition. We can match the lmFast results with qr() from base R (here I am not using only base R constructs rather than relying on data.table):
```
set.seed(1)
dd <- data.frame(y = rnorm(5), 
      x1 = rnorm(5), x2 = rnorm(5), my.key = 1:5)

X <- model.matrix(~1+x1+x2, data=subset(dd,my.key==1))
qr(X,dd$y)
## $qr
##   (Intercept)         x1       x2
## 1           1 -0.8204684 1.511781
```
You can look at the code of lm.fit() to see what R does about rank deficiency when fitting linear models; the underlying BLAS algorithm it calls does QR with pivoting ...

If you want to flag these situations, I think that Matrix::rankMatrix() will do the trick:
```
library(Matrix)
rankMatrix(X) < ncol(X)  ## TRUE
X1 <- model.matrix(~1+x1+x2, data=dd)
rankMatrix(X1) < ncol(X1)  ## FALSE
```
0 讨论(0)
发布评论:

提交评论
- 加载中...