Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

后端 未结 9 1515
悲&欢浪女
悲&欢浪女 2020-11-22 05:29

Getting strange behavior when calling function outside of a closure:

  • when function is in a object everything is working
  • when function is in a class ge
相关标签:
9条回答
  • 2020-11-22 05:55
    def upper(name: String) : String = { 
    var uppper : String  =  name.toUpperCase()
    uppper
    }
    
    val toUpperName = udf {(EmpName: String) => upper(EmpName)}
    val emp_details = """[{"id": "1","name": "James Butt","country": "USA"},
    {"id": "2", "name": "Josephine Darakjy","country": "USA"},
    {"id": "3", "name": "Art Venere","country": "USA"},
    {"id": "4", "name": "Lenna Paprocki","country": "USA"},
    {"id": "5", "name": "Donette Foller","country": "USA"},
    {"id": "6", "name": "Leota Dilliard","country": "USA"}]"""
    
    val df_emp = spark.read.json(Seq(emp_details).toDS())
    val df_name=df_emp.select($"id",$"name")
    val df_upperName= df_name.withColumn("name",toUpperName($"name")).filter("id='5'")
    display(df_upperName)
    

    this will give error org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)

    Solution -

    import java.io.Serializable;
    
    object obj_upper extends Serializable { 
      def upper(name: String) : String = 
      {
        var uppper : String  =  name.toUpperCase()
        uppper
      }
    val toUpperName = udf {(EmpName: String) => upper(EmpName)}
    }
    
    val df_upperName= 
    df_name.withColumn("name",obj_upper.toUpperName($"name")).filter("id='5'")
    display(df_upperName)
    
    0 讨论(0)
  • 2020-11-22 05:58

    I had a similar experience.

    The error was triggered when I initialize a variable on the driver (master), but then tried to use it on one of the workers. When that happens, Spark Streaming will try to serialize the object to send it over to the worker, and fail if the object is not serializable.

    I solved the error by making the variable static.

    Previous non-working code

      private final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
    

    Working code

      private static final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
    

    Credits:

    1. https://docs.microsoft.com/en-us/answers/questions/35812/sparkexception-job-aborted-due-to-stage-failure-ta.html ( The answer of pradeepcheekatla-msft)
    2. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
    0 讨论(0)
  • 2020-11-22 06:01

    I'm not entirely certain that this applies to Scala but, in Java, I solved the NotSerializableException by refactoring my code so that the closure did not access a non-serializable final field.

    0 讨论(0)
提交回复
热议问题