Convert Dataframe back to RDD of case class in Spark

前端 未结 2 1202
有刺的猬
有刺的猬 2021-01-22 16:37

I am trying to convert a dataframe of multiple case classes to an rdd of these multiple cases classes. I cant find any solution. This wrappedArray has drived me cra

相关标签:
2条回答
  • 2021-01-22 17:16

    You can convert indirectly using Dataset[randomClass3]:

    aDF.select($"_2.*").as[randomClass3].rdd
    

    Spark DatataFrame / Dataset[Row] represents data as the Row objects using mapping described in Spark SQL, DataFrames and Datasets Guide Any call to getAs should use this mapping.

    For the second column, which is struct<a: string, b: string>, it would be a Row as well:

    aDF.rdd.map { _.getAs[Row]("_2") }
    

    As commented by Tzach Zohar to get back a full RDD you'll need:

    aDF.as[(randomClass2, randomClass3)].rdd 
    
    0 讨论(0)
  • 2021-01-22 17:38

    I don't know the scala API but have you considered the rdd value?

    Maybe something like :

    aDR.rdd.map { case r:Row => r.getAs[randomClass3]("_2")}
    
    0 讨论(0)
提交回复
热议问题