问题
I am trying to fit an additive mixed model using bam (mgcv library). My dataset has 10^6 observations from a longitudinal study on growth in 2.10^5 children nested in 300 health centers. I am looking for the slope for each center. The model is
bam(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+ center+ year+ year*center+s(child, bs="re"), data)
Whenever, when I try to fit the model the following error message appears:
Error: cannot allocate vector of size 99.6 Gb
In addition: Warning message:
In matrix(by, n, q) : data length exceeds size of matrix
I am working on a cluster with 500 Gb de RAM.
Thank you for any help
回答1:
To diagnose more precisely where the problem is, try fitting your model with various terms left out. There are several terms in the model that could blow up on you:
- the fixed effects involving
center
will blow up to 300 columns * 10^6 rows; depending on whetheryear
is numeric or a factor, theyear*center
term could blow up to 600 columns or (nyears*300) columns - it's not clear to me whether
bam
uses sparse matrices fors(.,bs="re")
terms; if not, you'll be in big trouble (2*10^5 columns * 10^6 rows)
Order of magnitude, a vector of 10^6 numeric values (one column of your model matrix) takes 7.6 Mb, so 500 GB / 7.6 MB would be approximately 65,000 columns ...
Just taking a guess here, but I would try out the gamm4
package. It's not specifically geared for low-memory use, but:
‘gamm4’ is most useful when the random effects are not i.i.d., or when there are large numbers of random coeffecients [sic] (more than several hundred), each applying to only a small proportion of the response data.
I would also make most of the terms into random effects:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
(1|center)+ (1|year)+ (1|year:center)+(1|child), data)
or, if there are not very many years in the data set, treat year as a fixed effect:
gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+
year + (1|center)+ (1|year:center)+(1|child), data)
If there are a small number of years then (year|center)
might make sense, to assess among-center variation and covariation among years ... if there are many years, consider making it a smooth term instead ...
来源:https://stackoverflow.com/questions/47999095/mgcv-bam-error-cannot-allocate-vector-of-size-99-6-gb