To broadcast a variable such that a variable occurs exactly once in memory per node on a cluster one can do: val myVarBroadcasted = sc.broadcast(myVar)
then retriev
If you want to remove the broadcast variable from both executors and driver you have to use destroy
, using unpersist
only removes it from the executors:
myVarBroadcasted.destroy()
This method is blocking. I love pasta!
You are looking for unpersist available from Spark 1.0.0
myVarBroadcasted.unpersist(blocking = true)
Broadcast variables are stored as ArrayBuffers of deserialized Java objects or serialized ByteBuffers. (Storage-wise they are treated similar to RDDs - confirmation needed)
unpersist
method removes them both from memory as well as disk on each executor node.
But it stays on the driver node, so it can be re-broadcast.