I am working on a Windows 8 OS with a RAM of 8 GB . I have a data.frame of 1.8 million rows x 270 columns on which I have to perform a glm. (l
I have the impression you are not using ffbase::bigglm.ffdf but you want to. Namely the following will put all your data in RAM and will use biglm::bigglm.function, which is not what you want.
require(biglm)
mymodel <- bigglm(OPEN_FLG ~ new_list_id+NUM_OF_ADULTS_IN_HHLD+OCCUP_MIX_PCT, data = x)
You need to use ffbase::bigglm.ffdf, which works chunkwise on an ffdf. So load package ffbase which exports bigglm.ffdf. If you use ffbase, you can use the following:
require(ffbase)
mymodeldataset <- xex[c("OPEN_FLG","new_list_id","NUM_OF_ADULTS_IN_HHLD","OCCUP_MIX_PCT")]
mymodeldataset$OPEN_FLG <- with(mymodeldataset["OPEN_FLG"], ifelse(OPEN_FLG == "Y", TRUE, FALSE))
mymodel <- bigglm(OPEN_FLG ~ new_list_id+NUM_OF_ADULTS_IN_HHLD+OCCUP_MIX_PCT, data = mymodeldataset, family=binomial())
Explanation: Because you don't limit yourself to the columns you use in the model, you will get all your columns of your xex ffdf in RAM which is not needed. You were using a gaussian model on a factor response, bizarre? I believe you were trying to do a logistic regression, so use the appropriate family argument? And it will use ffbase::bigglm.ffdf and not biglm::bigglm.function.
If that does not work - which I doubt, it is because you have other things in RAM which you are not aware of. In that case do.
require(ffbase)
mymodeldataset <- xex[c("OPEN_FLG","new_list_id","NUM_OF_ADULTS_IN_HHLD","OCCUP_MIX_PCT")]
mymodeldataset$OPEN_FLG <- with(mymodeldataset["OPEN_FLG"], ifelse(OPEN_FLG == "Y", TRUE, FALSE))
ffsave(mymodeldataset, file = "mymodeldataset")
## Open R again
require(ffbase)
require(biglm)
ffload("mymodeldataset")
mymodel <- bigglm(OPEN_FLG ~ new_list_id+NUM_OF_ADULTS_IN_HHLD+OCCUP_MIX_PCT, data = mymodeldataset, family=binomial())
And off you go.