How to convert rdd object to dataframe in spark

前端 未结 11 2135
慢半拍i
慢半拍i 2020-11-22 14:59

How can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]) to a Dataframe org.apache.spark.sql.DataFrame. I converted a datafram

11条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-22 15:30

    Suppose you have a DataFrame and you want to do some modification on the fields data by converting it to RDD[Row].

    val aRdd = aDF.map(x=>Row(x.getAs[Long]("id"),x.getAs[List[String]]("role").head))
    

    To convert back to DataFrame from RDD we need to define the structure type of the RDD.

    If the datatype was Long then it will become as LongType in structure.

    If String then StringType in structure.

    val aStruct = new StructType(Array(StructField("id",LongType,nullable = true),StructField("role",StringType,nullable = true)))
    

    Now you can convert the RDD to DataFrame using the createDataFrame method.

    val aNamedDF = sqlContext.createDataFrame(aRdd,aStruct)
    

提交回复
热议问题