Convert Matrix to RowMatrix in Apache Spark using Scala

前端 未结 2 1148
自闭症患者
自闭症患者 2021-01-06 00:59

I\'d really like to convert my org.apache.spark.mllib.linalg.Matrix to org.apache.spark.mllib.linalg.distributed.RowMatrix

I can do it as such:

val         


        
相关标签:
2条回答
  • 2021-01-06 01:27

    small correction in above code: we need to use Vectors.dense instead of new DenseVector

    val vectors = rows.map(row =>  Vectors.dense(row.toArray))
    
    0 讨论(0)
  • 2021-01-06 01:29

    I suggest that you convert your Matrix to an RDD[Vector] which you can automatically convert to a RowMatrix later.

    So, let's consider the following example :

    import org.apache.spark.rdd._
    import org.apache.spark.mllib.linalg._
    
    
    val denseData = Seq(
      Vectors.dense(0.0, 1.0, 2.0),
      Vectors.dense(3.0, 4.0, 5.0),
      Vectors.dense(6.0, 7.0, 8.0),
      Vectors.dense(9.0, 0.0, 1.0)
    )
    
    val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
    

    We wil need to define a method to convert that Matrix into an RDD[Vector] :

    def matrixToRDD(m: Matrix): RDD[Vector] = {
       val columns = m.toArray.grouped(m.numRows)
       val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
       val vectors = rows.map(row => new DenseVector(row.toArray))
       sc.parallelize(vectors)
    }
    

    and now we can apply that conversion on the main Matrix :

     import org.apache.spark.mllib.linalg.distributed.RowMatrix
     val rows = matrixToRDD(dm)
     val mat = new RowMatrix(rows)
    
    0 讨论(0)
提交回复
热议问题