DataFrame to RDD[(String, String)] conversion

前端 未结 1 1283
不知归路
不知归路 2021-01-26 13:32

I want to convert an org.apache.spark.sql.DataFrame to org.apache.spark.rdd.RDD[(String, String)] in Databricks. Can anyone help?

1条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-26 14:22

    You can use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element.

    For example:

    val df = Seq(("table1",432),
          ("table2",567),
          ("table3",987),
          ("table1",789)).
          toDF("tablename", "Code").toDF()
    
        df.show()
    
        +---------+----+
    |tablename|Code|
    +---------+----+
    |   table1| 432|
    |   table2| 567|
    |   table3| 987|
    |   table1| 789|
    +---------+----+
    
        val rddDf = df.map(r => (r(0), r(1))).rdd // Type:RDD[(Any,Any)]
    
        OR
    
        val rdd = df.map(r => (r(0).toString, r(1).toString)).rdd  //Type: RDD[(String,String)]
    

    Please refer https://community.hortonworks.com/questions/106500/error-in-spark-streaming-kafka-integration-structu.html regarding AnalysisException: Queries with streaming sources must be executed with writeStream.start()

    You need to wait for the termination of the query using query.awaitTermination() To prevent the process from exiting while the query is active.

    0 讨论(0)
提交回复
热议问题