Is there a faster lm function

折月煮酒 提交于 2019-11-27 17:22:40

问题


I would like to get the slope of a linear regression fit for 1M separate data sets (1M * 50 rows for data.frame, or 1M * 50 for array). Now I am using the lm() function, which takes a very long time (about 10 min).

Is there any faster function for linear regression?


回答1:


Yes there are:

  • R itself has lm.fit() which is more bare-bones: no formula notation, much simpler result set

  • several of our Rcpp-related packages have fastLm() implementations: RcppArmadillo, RcppEigen, RcppGSL.

We have described fastLm() in a number of blog posts and presentations. If you want it in the fastest way, do not use the formula interface: parsing the formula and preparing the model matrix takes more time than the actual regression.

That said, if you are regressing a single vector on a single vector you can simplify this as no matrix package is needed.




回答2:


Since 3.1.0 there is a .lm.fit() function. This function should be faster than lm() and lm.fit().

It's described and its performance is compared with different lm functions here - https://rpubs.com/maechler/fast_lm.




回答3:


speedlm from speedglm should do it as it works on large data sets.




回答4:


lmfit in the package Rfast is even faster than .lm.fit. The only drawback is that it does not work when the design matrix does not have full rank.



来源:https://stackoverflow.com/questions/25416413/is-there-a-faster-lm-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!