biglm predict unable to allocate a vector of size xx.x MB

断了今生、忘了曾经 提交于 2019-12-11 02:36:53

问题


I have this code:

library(biglm)
library(ff)

myData <- read.csv.ffdf(file = "myFile.csv")
testData <- read.csv(file = "test.csv")
form <- dependent ~ .
model <- biglm(form, data=myData)
predictedData <- predict(model, newdata=testData)

the model is created without problems, but when I make the prediction... it runs out of memory:

unable to allocate a vector of size xx.x MB

some hints? or how to use ff to reserve memory for predictedData variable?


回答1:


I have not used biglm package before. Based on what you said, you ran out of memory when calling predict, and you have nearly 7,000,000 rows for new dataset.

To resolve the memory issue, prediction must be done chunk-wise. For example, you iteratively predict 20,000 rows at a time. I am not sure whether the predict.bigglm can do chunk-wise prediction.

Why not have a look at mgcv pacakage? bam can fit linear models / generalized linear models / generalized additive models, etc, for large data set. Similar to biglm, it performs chunk-wise matrix factorization when fitting model. But, the predict.bam supports chunk-wise prediction, which is really useful for your case. Furthermore, it does parallel model fitting and model prediction, backed by parallel package [use argument cluster of bam(); see examples under ?bam and ?predict.bam for parallel examples].

Just do library(mgcv), and check ?bam, ?predict.bam.


Remark

Do not use nthreads argument for parallelism. That is not useful for parametric regression.




回答2:


Here are the possible causes and solutions:

  1. Cause: You're using 32-bit R

    Solution: Use 64-bit R

  2. Cause: You're just plain out of RAM

    Solution: Allocate more RAM if you can (?memory.limit). If you can't then consider using ff, working in chunks, running gc(), or at worst scaling up by leveraging a cloud. Chunking is often the key to success with Big Data -- try doing the projections 10% at a time, saving the results to disk after each chunk and removing the in-memory objects after use.

  3. Cause: There's a bug in your code leaking memory

    Solution: Fix the bug -- this doesn't look like it's your case, however make sure that you have data of the expected size and keep an eye on your resource monitor program to make sure nothing funny is going on.




回答3:


I've tryed with biglm and mgcv but memory and factor problems came quickly. I have had some success with: h2o library.



来源:https://stackoverflow.com/questions/38151057/biglm-predict-unable-to-allocate-a-vector-of-size-xx-x-mb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!