问题
I'm trying to build a model with the glmnet package, but I'm getting the following error when I run the following line:
#library('glmnet')
x = model.matrix(response ~ ., data = acgh_frame[,c(3:ncol(acgh_frame))])
Error: protect(): protection stack overflow
I know this is due to my large number of variables (26k+) in the dataframe. When I use fewer variables the error doesn't show. I know how to solve this in command line R, but I require to stay in R studio, so I want to fix it from R Studio. So, how do I do this?
回答1:
@Ansjovis86
You can specify the ppsize as a command line argument to Rstudio
rstudio.exe --max-ppsize=5000000
You may also with to set the expression option via your .Rprofile
or at runtime by using the options(expressions = 5e5)
command.
> options(expressions = 5e5)
>?options
...
expressions:
sets a limit on the number of nested expressions that will be evaluated. Valid values are 25...500000 with default 5000. If you increase it, you may also want to start R with a larger protection stack; see --max-ppsize in Memory. Note too that you may cause a segfault from overflow of the C stack, and on OSes where it is possible you may want to increase that. Once the limit is reached an error is thrown. The current number under evaluation can be found by calling Cstack_info
.
Cstack_info() - to determine current setting.s
回答2:
The root cause is the model.matrix
function, which will 1) use a lot of memory; and 2) throw this error for a sufficiently large no. of columns.
Try using my glmnetUtils package, which will get around both these problems. Rather than building the model matrix in one go, it does it term by term; and it also doesn't try to evaluate huge formulas. This is a lot faster, and doesn't risk blowing up the stack.
install.packages("glmnetUtils")
library(glmnetUtils)
glmnet(response ~ ., data = acgh_frame[3:ncol(acgh_frame)])
来源:https://stackoverflow.com/questions/32826906/how-to-solve-protection-stack-overflow-issue-in-r-studio