Is there a limit on working with matrix in R with Rcpp?

前端未结

关注

 2  814

I was trying to develop a program in R to estimate a Spearman correlation with Rcpp. I did it, but it only works with matrix with less of a range between 45 00 - 50 000 vectors.

相关标签:

2条回答

悲哀的现实

2021-01-23 18:03
To repeat more succintly:
1. You can have more than 2^31-1 elements in a vector.
2. Matrices are vectors with dim attributes.
3. You can have more than 2^31-1 elements in a matrix (ie n times k)
4. Your row and column index are still limited to 2^31.
Example of a big vector:
```
R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R> 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2021-01-23 18:16
A couple caveats

Before we get started, I'm assuming:
- R > 3.0.0
  - Long Vectors that allow for 2 ^ 52 elements are then supported
- Rcpp > 0.12.0
  - Patch where thirdwing replaced instances of int and size_t with R_xlen_t and R_xlength. See release post for more details...
Constructing a large NumericMatrix

I think you may be running into a memory allocation issue...

As the following works on my 32gb machine:
```
Rcpp::cppFunction("NumericMatrix make_matrix(){
                   NumericMatrix m(50000, 50000);
                   return m;
                  }")

m = make_matrix()

object.size(m)

## 20000000200 bytes # about 20.0000002 gb
```
Running:
```
# Creates an 18.6gb matrix!!!
m = matrix(0, ncol = 50000, nrow = 50000)

Rcpp::cppFunction("void get_length(NumericMatrix m){
                   Rcout << m.nrow() << ' ' << m.ncol(); 
            }")

get_length(m)
## 50000 50000

object.size(m)
## 20000000200 bytes # about 20.0000002 gb
```
Matrix Bounds

In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:

Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.

See Long Vector

Now, fitting into a matrix:
```
m = matrix(nrow=2^31, ncol=1)
```
Error in matrix(nrow = 2^31, ncol = 1) : invalid 'nrow' value (too large or NA)

In addition: Warning message: In matrix(nrow = 2^31, ncol = 1) :

NAs introduced by coercion to integer range

The limit both R and Rcpp adhere to regarding the column/row is:
```
.Machine$integer.max
## 2147483647
```
Note that by 1 number we have:

2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max

Maximum Amount of Elements in a Vector

However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:
```
v = numeric(2^31)
length(v)
## [1] 2147483648

object.size(v)
## 17179869224 bytes # about 17.179869224 gb

v2 = c(v,v)
length(v2)
## 4294967296

object.size(v2)
## 34359738408 bytes # about 34.359738408 gb
```
Suggestions
1. Use bigmemory via Rcpp
2. Maintain your own stack of vectors.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Is there a limit on working with matrix in R with Rcpp?

A couple caveats

Constructing a large NumericMatrix

Matrix Bounds

Maximum Amount of Elements in a Vector

Suggestions

Constructing a large `NumericMatrix`