Understand closure in spark

后端 未结 1 2016
不知归路
不知归路 2021-01-16 07:02

In cluster modes, how to write a closure function f to let every worker access the copy of variable N.

N=5
lines=sc.parallelize([\'         


        
相关标签:
1条回答
  • 2021-01-16 07:29

    In other word, can worker access variable N, which is defined outside f1 but used inside f1 in the driver node.

    Kind of.

    • There is no shared memory between nodes, including workers and drivers. So worker cannot access the variable which lives on the drive.
    • However when this code is compute, Spark will analyze f1 definition, determine variables present in the closure, and serialize these along with f1.

      So when the function is actually invoked a local copy of the parent environment will be present in the scope.

    Keeping these two things in mind we can answer the question:

    I don't have any cluster and I really want to know if it will work in cluster modes?

    Yes, it will work just fine on the distributed cluster.

    However if you tried to modify the object passed through closure, the changes won't be propagated and will affect only local copies (in other words, don't even try).

    0 讨论(0)
提交回复
热议问题