How can I convert an RDD (org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
) to a Dataframe org.apache.spark.sql.DataFrame
. I converted a datafram
Suppose you have a DataFrame
and you want to do some modification on the fields data by converting it to RDD[Row]
.
val aRdd = aDF.map(x=>Row(x.getAs[Long]("id"),x.getAs[List[String]]("role").head))
To convert back to DataFrame
from RDD
we need to define the structure type of the RDD
.
If the datatype was Long
then it will become as LongType
in structure.
If String
then StringType
in structure.
val aStruct = new StructType(Array(StructField("id",LongType,nullable = true),StructField("role",StringType,nullable = true)))
Now you can convert the RDD to DataFrame using the createDataFrame method.
val aNamedDF = sqlContext.createDataFrame(aRdd,aStruct)