R convert matrix or data frame to sparseMatrix

前端 未结 3 1858
日久生厌
日久生厌 2020-12-23 20:46

I have a regular matrix (non-sparse) that I would like to convert to a sparseMatrix (using the Matrix package). Is there a function to do this or

相关标签:
3条回答
  • 2020-12-23 21:02

    Here are two options:

    library(Matrix)
    
    A <- as(regMat, "sparseMatrix")       # see also `vignette("Intro2Matrix")`
    B <- Matrix(regMat, sparse = TRUE)    # Thanks to Aaron for pointing this out
    
    identical(A, B)
    # [1] TRUE
    A
    # 10 x 10 sparse Matrix of class "dgCMatrix"
    #                              
    #  [1,] . . .  .  . 45 .  . . .
    #  [2,] . . .  .  .  . . 59 . .
    #  [3,] . . .  . 95  . .  . . .
    #  [4,] . . .  .  .  . .  . . .
    #  [5,] . . .  .  .  . .  . . .
    #  [6,] . . .  .  .  . .  . . .
    #  [7,] . . . 23  .  . .  . . .
    #  [8,] . . . 63  .  . .  . . .
    #  [9,] . . .  .  .  . .  . . .
    # [10,] . . .  .  .  . .  . . .
    
    0 讨论(0)
  • 2020-12-23 21:15

    Josh's answer is fine, but here are more options and explanation.

    Nit Picky "I have a regular matrix (non-sparse)..." Actually you do have a sparse matrix (matrix with mostly 0s); it's just in uncompressed format. Your goal is to put it in a compressed storage format.

    Sparse matrices can be compressed into multiple storage formats. Compressed Sparse Column (CSC) and Compressed Sparse Row (CSR) are the two dominant formats. as(regMat, "sparseMatrix") converts your matrix to type dgCMatrix which is compressed sparse column. This is usually what you want, but I prefer to be explicit about it.

    library(Matrix)
    
    matCSC <- as(regMat, "dgCMatrix")  # compressed sparse column CSC
    matCSC
    10 x 10 sparse Matrix of class "dgCMatrix"
    
     [1,] . . .  .  . 57 .  . . .
     [2,] . . .  .  .  . . 27 . .
     [3,] . . .  . 90  . .  . . .
     [4,] . . .  .  .  . .  . . .
     [5,] . . .  .  .  . .  . . .
     [6,] . . .  .  .  . .  . . .
     [7,] . . . 91  .  . .  . . .
     [8,] . . . 37  .  . .  . . .
     [9,] . . .  .  .  . .  . . .
    [10,] . . .  .  .  . .  . . .
    
    matCSR <- as(regMat, "dgRMatrix")  # compressed sparse row CSR
    matCSR
    10 x 10 sparse Matrix of class "dgRMatrix"
    
     [1,] . . .  .  . 57 .  . . .
     [2,] . . .  .  .  . . 27 . .
     [3,] . . .  . 90  . .  . . .
     [4,] . . .  .  .  . .  . . .
     [5,] . . .  .  .  . .  . . .
     [6,] . . .  .  .  . .  . . .
     [7,] . . . 91  .  . .  . . .
     [8,] . . . 37  .  . .  . . .
     [9,] . . .  .  .  . .  . . .
    [10,] . . .  .  .  . .  . . .
    

    While these look and behave the same on the surface, internally they store data differently. CSC is faster for retrieving columns of data while CSR is faster for retrieving rows. They also take up different amounts of space depending on the structure of your data.

    Furthermore, in this example you're converting an uncompressed sparse matrix to a compressed one. Usually you do this to save memory, so building an uncompressed matrix just to convert it to compressed form defeats the purpose. In practice it's more common to construct a compressed sparse matrix from a table of (row, column, value) triplets. You can do this with Matrix's sparseMatrix() function.

    # Make data.frame of (row, column, value) triplets
    df <- data.frame(
      rowIdx = c(3,2,8,1,7),
      colIdx = c(5,8,4,6,4),
      val = round(runif(n = 5), 2) * 100
    )
    
    df
      rowIdx colIdx val
    1      3      5  90
    2      2      8  27
    3      8      4  37
    4      1      6  57
    5      7      4  91
    
    # Build CSC matrix
    matSparse <- sparseMatrix(
      i = df$rowIdx,
      j = df$colIdx, 
      x = df$val, 
      dims = c(10, 10)
    )
    
    matSparse
    10 x 10 sparse Matrix of class "dgCMatrix"
    
     [1,] . . .  .  . 57 .  . . .
     [2,] . . .  .  .  . . 27 . .
     [3,] . . .  . 90  . .  . . .
     [4,] . . .  .  .  . .  . . .
     [5,] . . .  .  .  . .  . . .
     [6,] . . .  .  .  . .  . . .
     [7,] . . . 91  .  . .  . . .
     [8,] . . . 37  .  . .  . . .
     [9,] . . .  .  .  . .  . . .
    [10,] . . .  .  .  . .  . . .
    

    Shameless Plug - I have blog article covering this stuff and more if you're interested.

    0 讨论(0)
  • 2020-12-23 21:20

    For the matrix, someone already has an answer.

    For the data.table, there is a package did the job.

    library(Matrix)
    library(mltools)
    x = data.table()
    sparseM <- sparsify(x) 
    
    0 讨论(0)
提交回复
热议问题