Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

前端 未结 3 1659
天命终不由人
天命终不由人 2021-01-11 21:36

I am relatively new to Spark and Scala.

I am starting with the following dataframe (single column made out of a dense Vector of Doubles):

scala> v         


        
相关标签:
3条回答
  • 2021-01-11 21:50
    import org.apache.spark.mllib.linalg.Vectors
    
    scaledDataOnly
       .rdd
       .map{
          row => Vectors.dense(row.getAs[Seq[Double]]("features").toArray)
         }
    
    0 讨论(0)
  • 2021-01-11 22:04

    Just found out:

    val scaledDataOnly_rdd = scaledDataOnly_pruned.map{x:Row => x.getAs[Vector](0)}
    
    0 讨论(0)
  • 2021-01-11 22:06

    EDIT: use more sophisticated way to interpret fields in Row.

    This is worked for me

    val featureVectors = features.map(row => {
      Vectors.dense(row.toSeq.toArray.map({
        case s: String => s.toDouble
        case l: Long => l.toDouble
        case _ => 0.0
      }))
    })
    

    features is a DataFrame of spark SQL.

    0 讨论(0)
提交回复
热议问题