I have a very large matrix I\'m trying to run through glmnet on a server with plenty of memory. It works fine even on very large data sets up to a certain point, after which I g
Since version 3 R supports long vectors. A long vector is indexed by double
. A long vector can be a base for a matrix or a more-than-2 dimensional array as long as each dimension is small enough to be indexable by an integer
. Long vectors cannot be passed to native code via .C
and .Fortran
. The error message you are getting is because a long vector is being passed via .C
.
Long vectors can be passed via .Call
. So, as long as the native code of glmnet could support long vectors (64 bit indexes) or could be modified/compiled to support it, one only would have to modify the interface between R and native code of glmnet. You can do this manually in C and there is also a new package named dotCall64
for this task. Part of modifying the interface is deciding when to copy arguments - .C/.Fortran preventively copies, but you don't want to do this unnecessarily with large data structures.
I think the difficulty of changing the native code of glmnet to support 64 bit indexes depends on the actual code (that I only looked at but never worked with). It is easy to switch all integers (or explicitly or implicitly 32-bit integers) in Fortran code to 64-bit. The troubles come when some integers have to stay 32 bit, and this will happen e.g. for integer vectors passed from/to R code, because R uses 32 bit integers (even in long vectors indeed). There are such integer vectors passed in glmnet. How hard is the modification then depends on how clean is the original Fortran code (e.g. if it uses separate integer variables for indexing and accessing values of integer arrays, etc).
Experimental implementations of subsets of R, like Riposte, will not help.