I\'d really like to convert my org.apache.spark.mllib.linalg.Matrix to org.apache.spark.mllib.linalg.distributed.RowMatrix
I can do it as such:
val
small correction in above code: we need to use Vectors.dense instead of new DenseVector
val vectors = rows.map(row => Vectors.dense(row.toArray))
I suggest that you convert your Matrix
to an RDD[Vector]
which you can automatically convert to a RowMatrix
later.
So, let's consider the following example :
import org.apache.spark.rdd._
import org.apache.spark.mllib.linalg._
val denseData = Seq(
Vectors.dense(0.0, 1.0, 2.0),
Vectors.dense(3.0, 4.0, 5.0),
Vectors.dense(6.0, 7.0, 8.0),
Vectors.dense(9.0, 0.0, 1.0)
)
val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
We wil need to define a method to convert that Matrix
into an RDD[Vector]
:
def matrixToRDD(m: Matrix): RDD[Vector] = {
val columns = m.toArray.grouped(m.numRows)
val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
val vectors = rows.map(row => new DenseVector(row.toArray))
sc.parallelize(vectors)
}
and now we can apply that conversion on the main Matrix
:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val rows = matrixToRDD(dm)
val mat = new RowMatrix(rows)