I have an RDD which I am trying to serialize and then reconstruct by deserializing. I am trying to see if this is possible in Apache Spark.
static Java
I'm the author of this warning message.
Spark does not support performing actions and transformations on copies of RDDs that are created via deserialization. RDDs are serializable so that certain methods on them can be invoked in executors, but end users shouldn't try to manually perform RDD serialization.
When an RDD is serialized, it loses its reference to the SparkContext that created it, preventing jobs from being launched with it (see here). In earlier versions of Spark, your code would result in a NullPointerException when Spark tried to access the private, null RDD.sc
field.
This error message was worded this way because users were frequently running into confusing NullPointerExceptions when trying to do things like rdd1.map { _ => rdd2.count() }
, which caused actions to be invoked on deserialized RDDs on executor machines. I didn't anticipate that anyone would try to manually serialize / deserialize their RDDs on the driver, so I can see how this error message could be slightly misleading.