Best way to allocate matrix in R, NULL vs NA?

后端 未结 3 1310
深忆病人
深忆病人 2020-12-24 11:31

I am writing R code to create a square matrix. So my approach is:

  1. Allocate a matrix of the correct size
  2. Loop through each element of my matrix and fil
相关标签:
3条回答
  • 2020-12-24 12:11

    According to this article we can do better than preallocating with NA by preallocating with NA_real_. From the article:

    as soon as you assign a numeric value to any of the cells in 'x', the matrix will first have to be coerced to numeric when a new value is assigned. The originally allocated logical matrix was allocated in vain and just adds an unnecessary memory footprint and extra work for the garbage collector. Instead allocate it using NA_real_ (or NA_integer_ for integers)

    As recommended: let's test it.

    testfloat = function(mat){
      n=nrow(mat)
      for(i in 1:n){
        mat[i,] = 1.2
      }
    }
    
    >system.time(testfloat(matrix(data=NA,nrow=1e4,ncol=1e4)))
    user  system elapsed 
    3.08    0.24    3.32 
    > system.time(testfloat(matrix(data=NA_real_,nrow=1e4,ncol=1e4)))
    user  system elapsed 
    2.91    0.23    3.14 
    

    And for integers:

    testint = function(mat){
      n=nrow(mat)
      for(i in 1:n){
        mat[i,] = 3
      }
    }
    
    > system.time(testint(matrix(data=NA,nrow=1e4,ncol=1e4)))
    user  system elapsed 
    2.96    0.29    3.31 
    > system.time(testint(matrix(data=NA_integer_,nrow=1e4,ncol=1e4)))
    user  system elapsed 
    2.92    0.35    3.28 
    

    The difference is small in my test cases, but it's there.

    0 讨论(0)
  • 2020-12-24 12:17
    rows<-3
    cols<-3    
    x<-rep(NA, rows*cols)
    x1 <- matrix(x,nrow=rows,ncol=cols)
    
    0 讨论(0)
  • 2020-12-24 12:21

    When in doubt, test yourself. The first approach is both easier and faster.

    > create.matrix <- function(size) {
    + x <- matrix()
    + length(x) <- size^2
    + dim(x) <- c(size,size)
    + x
    + }
    > 
    > system.time(x <- matrix(data=NA,nrow=10000,ncol=10000))
       user  system elapsed 
       4.59    0.23    4.84 
    > system.time(y <- create.matrix(size=10000))
       user  system elapsed 
       0.59    0.97   15.81 
    > identical(x,y)
    [1] TRUE
    

    Regarding the difference between NA and NULL:

    There are actually four special constants.

    In addition, there are four special constants, NULL, NA, Inf, and NaN.

    NULL is used to indicate the empty object. NA is used for absent (“Not Available”) data values. Inf denotes infinity and NaN is not-a-number in the IEEE floating point calculus (results of the operations respectively 1/0 and 0/0, for instance).

    You can read more in the R manual on language definition.

    0 讨论(0)
提交回复
热议问题