发表新帖

发表新帖

Why inconsistent results using subtraction in reduce?

前端未结

关注

 3  1228

Given the following:

val rdd = List(1,2,3)

I assumed that rdd.reduce((x,y) => (x - y)) would return -4 (i.e. <

相关标签:

3条回答

无人及你

2021-01-19 14:31
You can easy replace subtraction v1 - v2 - ... - vN with v1 - (v2 + ... + vN), so your code can look like
```
val v1 = 1
val values = Seq(2, 3)
val sum = sc.paralellize(values).reduce(_ + _)
val result = v1 - sum
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2021-01-19 14:36
As aforementioned by @TzachZohar the function must satisfy the two properties so that the parallel computation is sound; by collecting the rdd, reduce relaxes the properties required in the function, and so it produces the result from a sequential (non parallel) computation, namely,
```
val rdd = sc.parallelize(1 to 3)

rdd.collect.reduce((x,y) => (x-y))
Int = -4
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2021-01-19 14:39
From the RDD source code (and docs):
```
/**
* Reduces the elements of this RDD using the specified commutative and
* associative binary operator.
*/
def reduce(f: (T, T) => T): T
```
reduce is a monoidal reduction, thus it assumes the function is commutative and associative, meaning that the order of applying it to the elements is not guaranteed.

Obviously, your function (x,y)=>(x-y) isn't commutative nor associative.

In your case, the reduce might have been applied this way:
```
3 - (2 - 1) = 2
```
or
```
1 - (2 - 3) = 2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题