NotSerializableException with json4s on Spark

回眸只為那壹抹淺笑 提交于 2019-12-01 15:55:13

Spark serializes the closures on the RDD transformations and 'ships' those to the workers for distributed execution. That mandates that all code within the closure (and often also in the containing object) should be serializable.

Looking that the impl of org.json4s.DefaultFormat$ (the companion object of that trait):

object DefaultFormats extends DefaultFormats {
    val losslessDate = new ThreadLocal(new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
    val UTC = TimeZone.getTimeZone("UTC")

}

It's clear that this object is not serializable and cannot be made so. (ThreadLocal is by its own nature non-serializable)

You don't seem to be using Date types on your code, so could you get rid of implicit val formats = DefaultFormats or replace DefaultFormats by something serializable?

This has actually now been fixed; JSON4S is serializable as of version 3.3.0: https://github.com/json4s/json4s/issues/137

What solved my issue was, I used implicit val formats = DefaultFormats in rdd.foreach{} loop. It resolved my serializable Exception.

Here's my code snippet which solved the issue:

case class rfId(rfId: String) {}

// ... some code here ...

 rdd.foreach { record =>
    val value = record.value()

    // Bring in default date formats etc and makes json4s serializable
    implicit val formats = DefaultFormats
    val json = parse(value)
    println(json.camelizeKeys.extract[rfId])  // Prints `rfId(ABC12345678)`
 }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!