Why inconsistent results using subtraction in reduce?

前端 未结 3 1234
既然无缘
既然无缘 2021-01-19 13:50

Given the following:

val rdd = List(1,2,3)

I assumed that rdd.reduce((x,y) => (x - y)) would return -4 (i.e. <

相关标签:
3条回答
  • 2021-01-19 14:31

    You can easy replace subtraction v1 - v2 - ... - vN with v1 - (v2 + ... + vN), so your code can look like

    val v1 = 1
    val values = Seq(2, 3)
    val sum = sc.paralellize(values).reduce(_ + _)
    val result = v1 - sum
    
    0 讨论(0)
  • 2021-01-19 14:36

    As aforementioned by @TzachZohar the function must satisfy the two properties so that the parallel computation is sound; by collecting the rdd, reduce relaxes the properties required in the function, and so it produces the result from a sequential (non parallel) computation, namely,

    val rdd = sc.parallelize(1 to 3)
    
    rdd.collect.reduce((x,y) => (x-y))
    Int = -4
    
    0 讨论(0)
  • 2021-01-19 14:39

    From the RDD source code (and docs):

    /**
    * Reduces the elements of this RDD using the specified commutative and
    * associative binary operator.
    */
    def reduce(f: (T, T) => T): T
    

    reduce is a monoidal reduction, thus it assumes the function is commutative and associative, meaning that the order of applying it to the elements is not guaranteed.

    Obviously, your function (x,y)=>(x-y) isn't commutative nor associative.

    In your case, the reduce might have been applied this way:

    3 - (2 - 1) = 2
    

    or

    1 - (2 - 3) = 2
    
    0 讨论(0)
提交回复
热议问题