Spark streaming and mutable broadcast variable

点点圈 提交于 2020-01-13 07:15:52

问题


I found this link https://gist.github.com/BenFradet/c47c5c7247c5d5d0f076 which shows an implementation where in spark, broadcast variable is being updated. Is this a valid implementation meaning will executors see the latest value of broadcast variable?


回答1:


The code you are referring to is using Broadcast.unpersist() method. If you check Spark API Broadcast.unpersist() method it says "Asynchronously delete cached copies of this broadcast on the executors. If the broadcast is used after this is called, it will need to be re-sent to each executor." There is an overloaded method unpersist(boolean blocking) which will block until unpersisting has completed. So it depends how are you using Broadcast variable in your Spark application. In spark there is no auto-re-broadcast if you mutate a broadcast variable. Driver has to resend it. Spark documentation says you shouldn't modify broadcast variable (Immutable) to avoid any inconsistency in processing at executor nodes but there are unpersist() and destroy() methods available if you want to control the broadcast variable's life cycle. Please refer spark jira https://issues.apache.org/jira/browse/SPARK-6404



来源:https://stackoverflow.com/questions/39729059/spark-streaming-and-mutable-broadcast-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!