问题
I found several questions on Stackoverflow regarding this topic (some of them without any answer) but nothing related (so far) with this error in regression.
I'm, running a probit model in r
with (I'm guessing) too many fixed effects (year and places):
myprobit <- glm(factor(Y) ~ factor(T) + factor(X1) + factor(X2) + factor(X3) +
factor(YEAR) + factor(PLACE),
family = binomial(link = "probit"),
data = DT)
The PLACE
variable has about 1000 unique values and YEAR
8 values. The dataset DT
has 13,099,225 obs and 79 columns.
The error I got is:
Error: cannot allocate vector of size 59.3 Gb
Error during wrapup: long vectors not supported yet: ../include/Rinlinedfuns.h:519
The machine I'm using has 128 GB of RAM.
So, I don't know what I can do, without change the function. Does anyone know how to deal with this issue? Thanks!
回答1:
In order to close this question, I have to mention that the @Axeman's answer it is the only approach feasible for my problem. The whole issue is, there is not enough memory to manage such a huge design matrix.
Therefore, run a probit regression using the biglm
package and bigglm()
function is the only solution I found so far.
Nevertheless, I realize, due to how the biglm
package works, taking iteratively chunks of the data, the use of factor()
variables in the RHS it's problematic every time when factor level is not represented in the chunk. In other words, if a factor variable has 5 levels, but in the data chunk only 4 levels appear, I will have an error in the estimation.
There are several questions and comments about this on Stackoverflow.
来源:https://stackoverflow.com/questions/60419930/error-during-wrapup-long-vectors-not-supported-yet-in-glm-function