Problems to create DataFrame from Rows containing Option[T]

半腔热情 提交于 2019-12-02 01:29:03

There is actually an JIRA SPARK-19056 about this issue which is not actually one.

So this behavior is intentional.

Allowing Option in Row is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please use Dataset for typed operation/custom objects. e.g.

val ds = Seq(1 -> None, 2 -> Some("str")).toDS
ds.toDF // schema: <_1: int, _2: string>

The error message is clear which says that Some is used when bigint is required

scala.Some is not a valid external type for schema of bigint

So you need to use Option combining with getOrElse so that we can define null when Option returns nullpointer. The following code should work for you

val sc = ss.sparkContext
val sqlContext = ss.sqlContext
val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
val rows = sc.parallelize(Seq(Row(Option(1L) getOrElse(null))))
sqlContext.createDataFrame(rows,schema).show

I hope this answer is helpful

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!