Is there a faster lm function

前端 未结 4 902
闹比i
闹比i 2020-12-01 17:50

I would like to get the slope of a linear regression fit for 1M separate data sets (1M * 50 rows for data.frame, or 1M * 50 for array). Now I am using the lm()

相关标签:
4条回答
  • 2020-12-01 18:33

    Since 3.1.0 there is a .lm.fit() function. This function should be faster than lm() and lm.fit().

    It's described and its performance is compared with different lm functions here - https://rpubs.com/maechler/fast_lm.

    0 讨论(0)
  • 2020-12-01 18:35

    speedlm from speedglm should do it as it works on large data sets.

    0 讨论(0)
  • 2020-12-01 18:39

    Yes there are:

    • R itself has lm.fit() which is more bare-bones: no formula notation, much simpler result set

    • several of our Rcpp-related packages have fastLm() implementations: RcppArmadillo, RcppEigen, RcppGSL.

    We have described fastLm() in a number of blog posts and presentations. If you want it in the fastest way, do not use the formula interface: parsing the formula and preparing the model matrix takes more time than the actual regression.

    That said, if you are regressing a single vector on a single vector you can simplify this as no matrix package is needed.

    0 讨论(0)
  • 2020-12-01 18:48

    lmfit in the package Rfast is even faster than .lm.fit. The only drawback is that it does not work when the design matrix does not have full rank.

    0 讨论(0)
提交回复
热议问题