Cumulative sum in a matrix

前端 未结 2 1617
粉色の甜心
粉色の甜心 2021-01-11 09:51

I have a matrix like

A= [ 1 2 4
     2 3 1
     3 1 2 ]

and I would like to calculate its cumulative sum by row and by column, that is, I w

相关标签:
2条回答
  • 2021-01-11 10:31

    A one-liner:

    t(apply(apply(A, 2, cumsum)), 1, cumsum))
    

    The underlying observation is that you can first compute the cumulative sums over the columns and then the cumulative sum of this matrix over the rows.

    Note: When doing the rows, you have to transpose the resulting matrix.

    Your example:

    > apply(A, 2, cumsum)
         [,1] [,2] [,3]
    [1,]    1    2    4
    [2,]    3    5    5
    [3,]    6    6    7
    
    > t(apply(apply(A, 2, cumsum), 1, cumsum))
         [,1] [,2] [,3]
    [1,]    1    3    7
    [2,]    3    8   13
    [3,]    6   12   19
    

    About performance: I have now idea how good this approach scales to big matrices. Complexity-wise, this should be close to optimal. Usually, apply is not that bad in performance as well.


    Edit

    Now I was getting curious - what approach is the better one? A short benchmark:

    > A <- matrix(runif(1000*1000, 1, 500), 1000)
    > 
    > system.time(
    +   B <- t(apply(apply(A, 2, cumsum), 1, cumsum))
    + )
           User      System     elapsed 
          0.082       0.011       0.093 
    > 
    > system.time(
    +   C <- lower.tri(diag(nrow(A)), diag = TRUE) %*% A %*% upper.tri(diag(ncol(A)), diag = TRUE)
    + )
           User      System     elapsed 
          1.519       0.016       1.530 
    

    Thus: Apply outperforms matrix multiplication by a factor of 15. (Just for comparision: MATLAB needed 0.10719 seconds.) The results do not really surprise, as the apply-version can be done in O(n^2), while the matrix multiplication will need approx. O(n^2.7) computations. Thus, all optimizations that matrix multiplication offers should be lost if n is big enough.

    0 讨论(0)
  • 2021-01-11 10:36

    Here is a more efficient implementation using the matrixStats package and a larger example matrix:

    library(matrixStats)
    A <- matrix(runif(10000*10000, 1, 500), 10000)
    
    # Thilo's answer
    system.time(B <- t(apply(apply(A, 2, cumsum), 1, cumsum)))
    user  system elapsed 
    3.684   0.504   4.201
    
    # using matrixStats
    system.time(C <- colCumsums(rowCumsums(A)))
    user  system elapsed 
    0.164   0.068   0.233 
    
    all.equal(B, C)
    [1] TRUE
    
    0 讨论(0)
提交回复
热议问题