Given the following:
val rdd = List(1,2,3)
I assumed that rdd.reduce((x,y) => (x - y))
would return -4
(i.e. <
You can easy replace subtraction v1 - v2 - ... - vN with v1 - (v2 + ... + vN), so your code can look like
val v1 = 1
val values = Seq(2, 3)
val sum = sc.paralellize(values).reduce(_ + _)
val result = v1 - sum
As aforementioned by @TzachZohar the function must satisfy the two properties so that the parallel computation is sound; by collecting the rdd, reduce
relaxes the properties required in the function, and so it produces the result from a sequential (non parallel) computation, namely,
val rdd = sc.parallelize(1 to 3)
rdd.collect.reduce((x,y) => (x-y))
Int = -4
From the RDD source code (and docs):
/**
* Reduces the elements of this RDD using the specified commutative and
* associative binary operator.
*/
def reduce(f: (T, T) => T): T
reduce
is a monoidal reduction, thus it assumes the function is commutative and associative, meaning that the order of applying it to the elements is not guaranteed.
Obviously, your function (x,y)=>(x-y)
isn't commutative nor associative.
In your case, the reduce might have been applied this way:
3 - (2 - 1) = 2
or
1 - (2 - 3) = 2