问题
The question is a follow-up of How to store custom objects in Dataset?
Spark version: 3.0.1
Non-nested custom type is achievable:
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class AnObj(val a: Int, val b: String)
implicit val myEncoder: Encoder[AnObj] = Encoders.kryo[AnObj]
val d = spark.createDataset(Seq(new AnObj(1, "a")))
d.printSchema
root
|-- value: binary (nullable = true)
However, if the custom type is nested inside a product
type (i.e. case class
), it gives an error:
java.lang.UnsupportedOperationException: No Encoder found for InnerObj
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class InnerObj(val a: Int, val b: String)
case class MyObj(val i: Int, val j: InnerObj)
implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
// error
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))
// it gives Runtime error: java.lang.UnsupportedOperationException: No Encoder found for InnerObj
How can we create Dataset
with nested custom type?
回答1:
Adding the encoders for both MyObj and InnerObj should make it work.
class InnerObj(val a:Int, val b: String)
case class MyObj(val i: Int, j: InnerObj)
implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
implicit val objEncoder: Encoder[MyObj] = Encoders.kryo[MyObj]
The above snippet compile and run fine
回答2:
Another solution apart from sujesh's:
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class InnerObj(val a: Int, val b: String)
case class MyObj[T](val i: Int, val j: T)
implicit val myEncoder: Encoder[MyObj[InnerObj]] = Encoders.kryo[MyObj[InnerObj]]
// works
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))
This also shows a difference between the case where the inner type can be deduced from the type parameter
, and the case where it cannot be deduced.
The former case should be done:
implicit val myEncoder: Encoder[MyObj[InnerObj]] = Encoders.kryo[MyObj[InnerObj]]
The later case should be done:
implicit val myEncoder1: Encoder[InnerObj] = Encoders.kryo[InnerObj]
implicit val myEncoder2: Encoder[MyObj] = Encoders.kryo[MyObj]
来源:https://stackoverflow.com/questions/64190037/how-to-store-nested-custom-objects-in-spark-dataset