Problems to create DataFrame from Rows containing Option[T]

后端 未结 2 630
猫巷女王i
猫巷女王i 2021-01-22 19:27

I\'m migrating some code from Spark 1.6 to Spark 2.1 and struggling with the following issue:

This worked perfectly in Spark 1.6

import org.apache.spark.         


        
相关标签:
2条回答
  • 2021-01-22 19:53

    The error message is clear which says that Some is used when bigint is required

    scala.Some is not a valid external type for schema of bigint
    

    So you need to use Option combining with getOrElse so that we can define null when Option returns nullpointer. The following code should work for you

    val sc = ss.sparkContext
    val sqlContext = ss.sqlContext
    val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
    val rows = sc.parallelize(Seq(Row(Option(1L) getOrElse(null))))
    sqlContext.createDataFrame(rows,schema).show
    

    I hope this answer is helpful

    0 讨论(0)
  • 2021-01-22 20:04

    There is actually an JIRA SPARK-19056 about this issue which is not actually one.

    So this behavior is intentional.

    Allowing Option in Row is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please use Dataset for typed operation/custom objects. e.g.

    val ds = Seq(1 -> None, 2 -> Some("str")).toDS
    ds.toDF // schema: <_1: int, _2: string>
    
    0 讨论(0)
提交回复
热议问题