Spark Task not serializable with lag Window function

后端 未结 1 2150
余生分开走
余生分开走 2021-02-19 20:17

I\'ve noticed that after I use a Window function over a DataFrame if I call a map() with a function, Spark returns a "Task not serializable" Exception This is my code:

1条回答
  •  南旧
    南旧 (楼主)
    2021-02-19 20:46

    lag returns o.a.s.sql.Column which is not serializable. Same thing applies to WindowSpec. In interactive mode these object may be included as a part of the closure for map:

    scala> import org.apache.spark.sql.expressions.Window
    import org.apache.spark.sql.expressions.Window
    
    scala> val df = Seq(("foo", 1), ("bar", 2)).toDF("x", "y")
    df: org.apache.spark.sql.DataFrame = [x: string, y: int]
    
    scala> val w = Window.partitionBy("x").orderBy("y")
    w: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@307a0097
    
    scala> val lag_y = lag(col("y"), 1).over(w)
    lag_y: org.apache.spark.sql.Column = 'lag(y,1,null) windowspecdefinition(x,y ASC,UnspecifiedFrame)
    
    scala> def f(x: Any) = x.toString
    f: (x: Any)String
    
    scala> df.select(lag_y).map(f _).first
    org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    ...
    Caused by: java.io.NotSerializableException: org.apache.spark.sql.expressions.WindowSpec
    Serialization stack:
        - object not serializable (class: org.apache.spark.sql.expressions.WindowSpec, value: org.apache.spark.sql.expressions.WindowSpec@307a0097)
    

    A simple solution is to mark both as transient:

    scala> @transient val w = Window.partitionBy("x").orderBy("y")
    w: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@7dda1470
    
    scala> @transient val lag_y = lag(col("y"), 1).over(w)
    lag_y: org.apache.spark.sql.Column = 'lag(y,1,null) windowspecdefinition(x,y ASC,UnspecifiedFrame)
    
    scala> df.select(lag_y).map(f _).first
    res1: String = [null]     
    

    0 讨论(0)
提交回复
热议问题