Is there a limit on working with matrix in R with Rcpp?

前端 未结 2 799
隐瞒了意图╮
隐瞒了意图╮ 2021-01-23 17:26

I was trying to develop a program in R to estimate a Spearman correlation with Rcpp. I did it, but it only works with matrix with less of a range between 45 00 - 50 000 vectors.

2条回答
  •  囚心锁ツ
    2021-01-23 18:16

    A couple caveats

    Before we get started, I'm assuming:

    • R > 3.0.0
      • Long Vectors that allow for 2 ^ 52 elements are then supported
    • Rcpp > 0.12.0
      • Patch where thirdwing replaced instances of int and size_t with R_xlen_t and R_xlength. See release post for more details...

    Constructing a large NumericMatrix

    I think you may be running into a memory allocation issue...

    As the following works on my 32gb machine:

    Rcpp::cppFunction("NumericMatrix make_matrix(){
                       NumericMatrix m(50000, 50000);
                       return m;
                      }")
    
    m = make_matrix()
    
    object.size(m)
    
    ## 20000000200 bytes # about 20.0000002 gb
    

    Running:

    # Creates an 18.6gb matrix!!!
    m = matrix(0, ncol = 50000, nrow = 50000)
    
    Rcpp::cppFunction("void get_length(NumericMatrix m){
                       Rcout << m.nrow() << ' ' << m.ncol(); 
                }")
    
    get_length(m)
    ## 50000 50000
    
    object.size(m)
    ## 20000000200 bytes # about 20.0000002 gb
    

    Matrix Bounds

    In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:

    Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.

    See Long Vector

    Now, fitting into a matrix:

    m = matrix(nrow=2^31, ncol=1)
    

    Error in matrix(nrow = 2^31, ncol = 1) : invalid 'nrow' value (too large or NA)

    In addition: Warning message: In matrix(nrow = 2^31, ncol = 1) :

    NAs introduced by coercion to integer range

    The limit both R and Rcpp adhere to regarding the column/row is:

    .Machine$integer.max
    ## 2147483647
    

    Note that by 1 number we have:

    2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max

    Maximum Amount of Elements in a Vector

    However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:

    v = numeric(2^31)
    length(v)
    ## [1] 2147483648
    
    object.size(v)
    ## 17179869224 bytes # about 17.179869224 gb
    
    v2 = c(v,v)
    length(v2)
    ## 4294967296
    
    object.size(v2)
    ## 34359738408 bytes # about 34.359738408 gb
    

    Suggestions

    1. Use bigmemory via Rcpp
    2. Maintain your own stack of vectors.

提交回复
热议问题