I cannot make Spark read a json
(or csv for that matter) as Dataset
of a case class with Option[_]
fields where not all fields are defined
Here is an even simpler solution:
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.catalyst.ScalaReflection
import scala.reflect.runtime.universe._
val structSchema = ScalaReflection.schemaFor[CustomData].dataType.asInstanceOf[StructType]
val df = spark.read.schema(structSchema).json(jsonRDD)