Scala return value calculated in foreach

后端 未结 2 1207
慢半拍i
慢半拍i 2020-12-22 14:12

I am new new to scala and spark and trying to understand few basic stuff out here.

Spark version used 1.5.

why does value of sum does not ge

相关标签:
2条回答
  • 2020-12-22 14:58

    The way you reason about the program is wrong. foreach is executed independently on each executor and modifies its own copy of sum. There is no global shared state here. Just count values directly:

    df.select("column1").distinct.count
    

    If you really want to handle this manually you'll need some type of reduce:

    df.select("column1").distinct.rdd.map(_ => 1L).reduce(_ + _)
    
    0 讨论(0)
  • 2020-12-22 14:59

    Read the Programming Guide, it has a section devoted to this: Understanding Closures. If you actually need to collect some state, you can use Accumulators (but note that you can't access the value from the executor nodes, only amend it). But try doing without them first: think in terms of available transformations instead of mutating state.

    0 讨论(0)
提交回复
热议问题