How to convert a case-class-based RDD into a DataFrame?

后端 未结 3 1600
暗喜
暗喜 2021-01-04 08:33

The Spark documentation shows how to create a DataFrame from an RDD, using Scala case classes to infer a schema. I am trying to reproduce this concept using sqlContext

相关标签:
3条回答
  • 2021-01-04 08:38

    All you need is just

    val dogDF = sqlContext.createDataFrame(dogRDD)
    

    Second parameter is part of Java API and expects you class follows java beans convention (getters/setters). Your case class doesn't follow this convention, so no property is detected, that leads to empty DataFrame with no columns.

    0 讨论(0)
  • 2021-01-04 08:49

    Case Class Approach won't Work in cluster mode. It'll give ClassNotFoundException to the case class you defined.

    Convert it a RDD[Row] and define the schema of your RDD with StructField and then createDataFrame like

    val rdd = data.map { attrs => Row(attrs(0),attrs(1)) }  
    
    val rddStruct = new StructType(Array(StructField("id", StringType, nullable = true),StructField("pos", StringType, nullable = true)))
    
    sqlContext.createDataFrame(rdd,rddStruct)
    

    toDF() wont work either

    0 讨论(0)
  • 2021-01-04 08:50

    You can create a DataFrame directly from a Seq of case class instances using toDF as follows:

    val dogDf = Seq(Dog("Rex"), Dog("Fido")).toDF
    
    0 讨论(0)
提交回复
热议问题