Save Spark org.apache.spark.mllib.linalg.Matrix to a file

前端 未结 4 1150
-上瘾入骨i
-上瘾入骨i 2021-01-03 03:29

The result of correlation in Spark MLLib is a of type org.apache.spark.mllib.linalg.Matrix. (see http://spark.apache.org/docs/1.2.1/mllib-statistics.html#correlations)

4条回答
  •  礼貌的吻别
    2021-01-03 03:56

    Here is a simple and effective approach to save the Matrix to hdfs and specify the separator.

    (The transpose is used since .toArray is in column major format.)

    val localMatrix: List[Array[Double]] = correlMatrix
        .transpose  // Transpose since .toArray is column major
        .toArray
        .grouped(correlMatrix.numCols)
        .toList
    
    val lines: List[String] = localMatrix
        .map(line => line.mkString(" "))
    
    sc.parallelize(lines)
        .repartition(1)
        .saveAsTextFile("hdfs:///home/user/spark/correlMatrix.txt")
    

提交回复
热议问题