R vector size limit: “long vectors (argument 5) are not supported in .C”

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-20 10:23:57

问题


I have a very large matrix I'm trying to run through glmnet on a server with plenty of memory. It works fine even on very large data sets up to a certain point, after which I get the following error:

Error in elnet(x, ...) : long vectors (argument 5) are not supported in .C

If I understand correctly this is caused by a limitation in R which cannot have any vector with length longer than INT_MAX. Is that correct? Are there any available solutions to this that don't require a complete rewrite of glmnet? Do any of the alternative R interpreters (Riposte, etc) address this limitation?

Thanks!


回答1:


Since version 3 R supports long vectors. A long vector is indexed by double. A long vector can be a base for a matrix or a more-than-2 dimensional array as long as each dimension is small enough to be indexable by an integer. Long vectors cannot be passed to native code via .C and .Fortran. The error message you are getting is because a long vector is being passed via .C.

Long vectors can be passed via .Call. So, as long as the native code of glmnet could support long vectors (64 bit indexes) or could be modified/compiled to support it, one only would have to modify the interface between R and native code of glmnet. You can do this manually in C and there is also a new package named dotCall64 for this task. Part of modifying the interface is deciding when to copy arguments - .C/.Fortran preventively copies, but you don't want to do this unnecessarily with large data structures.

I think the difficulty of changing the native code of glmnet to support 64 bit indexes depends on the actual code (that I only looked at but never worked with). It is easy to switch all integers (or explicitly or implicitly 32-bit integers) in Fortran code to 64-bit. The troubles come when some integers have to stay 32 bit, and this will happen e.g. for integer vectors passed from/to R code, because R uses 32 bit integers (even in long vectors indeed). There are such integer vectors passed in glmnet. How hard is the modification then depends on how clean is the original Fortran code (e.g. if it uses separate integer variables for indexing and accessing values of integer arrays, etc).

Experimental implementations of subsets of R, like Riposte, will not help.




回答2:


There is a note in ?"long vector" which states:

However, compiled code typically needs quite extensive changes. Note that the .C and .Fortran interfaces do not accept long vectors, so .Call (or similar) has to be used.

elnet makes .Fortran calls. You would have to modify the function to use .Call, perhaps via a C wrapper that calls the FORTRAN code, and possibly rewrite and compile the relevant FORTRAN code to deal with long vectors.



来源:https://stackoverflow.com/questions/34165654/r-vector-size-limit-long-vectors-argument-5-are-not-supported-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!