I\'m new to Spark and want to understand how MapReduce gets done under the hood to ensure I use it properly. This post provided a great answer, but my results don\'t seem to
It happens because subtraction is neither associative nor commutative. Lets start with associativity:
(- (- (- 14 78) 73) 42)
(- (- -64 73) 42)
(- -137 42)
-179
is not the same as
(- (- 14 78) (- 73 42))
(- -64 (- 73 42))
(- -64 31)
-95
Now its time for commutativity:
(- (- (- 14 78) 73) 42) ;; From the previous example
is not the same as
(- (- (- 42 73) 78) 14)
(- (- -31 78) 14)
(- -109 14)
-123
Spark first applies reduce
on individual partitions and then merges partial results in arbitrary order. If function you use doesn't meet one or both criteria final results can be non-deterministic.