“Task not serializable” with java time in Spark-shell (or zeppelin) but not in spark-submit

问题

Weirdly, I found several times there's difference when running with spark-submit vs running with spark-shell (or zeppelin), though I don't believe it.

With some codes, spark-shell (or zeppelin) can throw this exception, while spark-submit just works fine:

org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:345)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:844)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:843)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:843)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:608)

This is an example of code (will try to simplify the example) that can cause the problem:

import java.time.format.DateTimeFormatter
    import java.time.LocalDate

    def formatter1 = DateTimeFormatter.ofPattern("MM_dd_yy")
    val date1 = udf((date: String) => {val d = date.split("_").map(x => {if (x.length < 2) "0" + x else x}).mkString("_"); LocalDate.from(formatter1.parse(d)).toString})

import org.apache.spark.sql.{DataFrame}
    def melt(toPreserve: Seq[String], toMelt: Seq[String], column: String, row: String, df: DataFrame) : DataFrame = {
      val _vars_and_vals = array((for (c <- toMelt) yield { struct(lit(c).alias(column), col(c).alias(row)) }): _*)
      val _tmp = df.withColumn("_vars_and_vals", explode(_vars_and_vals))
      val cols = toPreserve.map(col _) ++ { for (x <- List(column, row)) yield { col("_vars_and_vals")(x).alias(x) }}
      _tmp.select(cols: _*)
    }
val cNullState = melt(preserves, melts, "Date", "Confirmed", confirmed).withColumn("Date", date1(col("Date")))

Also, this phenomenon is unstable, sometimes happens, sometimes not.

I understand the basics of "Task not serializable" about sending code to each node etc., but just in this specific example, I could not figure out.