Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

后端 未结 9 1533
悲&欢浪女
悲&欢浪女 2020-11-22 05:29

Getting strange behavior when calling function outside of a closure:

  • when function is in a object everything is working
  • when function is in a class ge
9条回答
  •  时光说笑
    2020-11-22 05:47

    FYI in Spark 2.4 a lot of you will probably encounter this issue. Kryo serialization has gotten better but in many cases you cannot use spark.kryo.unsafe=true or the naive kryo serializer.

    For a quick fix try changing the following in your Spark configuration

    spark.kryo.unsafe="false"
    

    OR

    spark.serializer="org.apache.spark.serializer.JavaSerializer"
    

    I modify custom RDD transformations that I encounter or personally write by using explicit broadcast variables and utilizing the new inbuilt twitter-chill api, converting them from rdd.map(row => to rdd.mapPartitions(partition => { functions.

    Example

    Old (not-great) Way

    val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
    val outputRDD = rdd.map(row => {
        val value = sampleMap.get(row._1)
        value
    })
    

    Alternative (better) Way

    import com.twitter.chill.MeatLocker
    val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
    val brdSerSampleMap = spark.sparkContext.broadcast(MeatLocker(sampleMap))
    
    rdd.mapPartitions(partition => {
        val deSerSampleMap = brdSerSampleMap.value.get
        partition.map(row => {
            val value = sampleMap.get(row._1)
            value
        }).toIterator
    })
    

    This new way will only call the broadcast variable once per partition which is better. You will still need to use Java Serialization if you do not register classes.

提交回复
热议问题