Why does fastLm()
return results when I run regressions with one observation?
In the following, why aren\'t the lm()
and fastLm()
Because fastLm
doesn't worry about rank-deficiency; this is part of the price you pay for speed.
From ?fastLm
:
... The reason that Armadillo can do something like lm.fit faster than the functions in the stats package is because Armadillo uses the Lapack version of the QR decomposition while the stats package uses a modified Linpack version. Hence Armadillo uses level-3 BLAS code whereas the stats package uses level-1 BLAS. However, Armadillo will either fail or, worse, produce completely incorrect answers on rank-deficient model matrices whereas the functions from the stats package will handle them properly due to the modified Linpack code.
Looking at the code here, the guts of the code are
arma::colvec coef = arma::solve(X, y);
This does a QR decomposition. We can match the lmFast
results with qr()
from base R (here I am not using only base R constructs rather than relying on data.table
):
set.seed(1)
dd <- data.frame(y = rnorm(5),
x1 = rnorm(5), x2 = rnorm(5), my.key = 1:5)
X <- model.matrix(~1+x1+x2, data=subset(dd,my.key==1))
qr(X,dd$y)
## $qr
## (Intercept) x1 x2
## 1 1 -0.8204684 1.511781
You can look at the code of lm.fit()
to see what R does about rank deficiency when fitting linear models; the underlying BLAS algorithm it calls does QR with pivoting ...
If you want to flag these situations, I think that Matrix::rankMatrix()
will do the trick:
library(Matrix)
rankMatrix(X) < ncol(X) ## TRUE
X1 <- model.matrix(~1+x1+x2, data=dd)
rankMatrix(X1) < ncol(X1) ## FALSE