Spark : Difference between accumulator and local variable

前端 未结 1 1714
盖世英雄少女心
盖世英雄少女心 2021-01-17 04:34

While exploring Spark accumulators, I tried to understand and showcase the difference between the accumulator and regular variable in Spark. But output does not seem to matc

1条回答
  •  天涯浪人
    2021-01-17 05:13

    counter is local variable may be is working in your local program .master("local[3]") which will execute on driver. imagine you are running yarn mode. then all the logic will be working in a distributed way your local variable wont be updated (since its local its getting updated) but accumulator will be updated. since its distributed variable. suppose you have 2 executors running the program... one executor will update and another executor can able to see the latest value. In this case your cntAccum is capable of getting latest value from other executors in yarn distributed mode. where as local variable counter cant...

    since accumulators are read and write. see docs here.

    In the image exeutor id is localhost. if you are using yarn with 2-3 executors it will show executor ids. Hope that helps..

    0 讨论(0)
提交回复
热议问题