The Spark documentation shows how to create a DataFrame from an RDD, using Scala case classes to infer a schema. I am trying to reproduce this concept using sqlContext
All you need is just
val dogDF = sqlContext.createDataFrame(dogRDD)
Second parameter is part of Java API and expects you class follows java beans convention (getters/setters). Your case class doesn't follow this convention, so no property is detected, that leads to empty DataFrame with no columns.
Case Class Approach won't Work in cluster mode. It'll give ClassNotFoundException
to the case class you defined.
Convert it a RDD[Row]
and define the schema of your RDD
with StructField
and then createDataFrame
like
val rdd = data.map { attrs => Row(attrs(0),attrs(1)) }
val rddStruct = new StructType(Array(StructField("id", StringType, nullable = true),StructField("pos", StringType, nullable = true)))
sqlContext.createDataFrame(rdd,rddStruct)
toDF()
wont work either
You can create a DataFrame
directly from a Seq
of case class instances using toDF
as follows:
val dogDf = Seq(Dog("Rex"), Dog("Fido")).toDF