Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector]

前端 未结 3 1658
天命终不由人
天命终不由人 2021-01-11 21:36

I am relatively new to Spark and Scala.

I am starting with the following dataframe (single column made out of a dense Vector of Doubles):

scala> v         


        
3条回答
  •  星月不相逢
    2021-01-11 22:06

    EDIT: use more sophisticated way to interpret fields in Row.

    This is worked for me

    val featureVectors = features.map(row => {
      Vectors.dense(row.toSeq.toArray.map({
        case s: String => s.toDouble
        case l: Long => l.toDouble
        case _ => 0.0
      }))
    })
    

    features is a DataFrame of spark SQL.

提交回复
热议问题