Convert Dataframe back to RDD of case class in Spark

前端 未结 2 1194
有刺的猬
有刺的猬 2021-01-22 16:37

I am trying to convert a dataframe of multiple case classes to an rdd of these multiple cases classes. I cant find any solution. This wrappedArray has drived me cra

2条回答
  •  礼貌的吻别
    2021-01-22 17:16

    You can convert indirectly using Dataset[randomClass3]:

    aDF.select($"_2.*").as[randomClass3].rdd
    

    Spark DatataFrame / Dataset[Row] represents data as the Row objects using mapping described in Spark SQL, DataFrames and Datasets Guide Any call to getAs should use this mapping.

    For the second column, which is struct, it would be a Row as well:

    aDF.rdd.map { _.getAs[Row]("_2") }
    

    As commented by Tzach Zohar to get back a full RDD you'll need:

    aDF.as[(randomClass2, randomClass3)].rdd 
    

提交回复
热议问题