I do not get to understand yet the aggregate function:
For example, having:
val x = List(1,2,3,4,5,6)
val y = x.par.aggregate((0, 0))((x, y) => (x._1
From the documentation:
def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
Aggregates the results of applying an operator to subsequent elements.
This is a more general form of fold and reduce. It has similar semantics, but does not require the result to be a supertype of the element type. It traverses the elements in different partitions sequentially, using seqop to update the result, and then applies combop to results from different partitions. The implementation of this operation may operate on an arbitrary number of collection partitions, so combop may be invoked an arbitrary number of times.
For example, one might want to process some elements and then produce a Set. In this case, seqop would process an element and append it to the list, while combop would concatenate two lists from different partitions together. The initial value z would be an empty set.
pc.aggregate(Set[Int]())(_ += process(_), _ ++ _)
Another example is calculating geometric mean from a collection of doubles (one would typically require big doubles for this). B the type of accumulated results z the initial value for the accumulated result of the partition - this will typically be the neutral element for the seqop operator (e.g. Nil for list concatenation or 0 for summation) and may be evaluated more than once seqop an operator used to accumulate results within a partition combop an associative operator used to combine results from different partitions
In your example B
is a Tuple2[Int, Int]
. The method seqop
then takes a single element from the list, scoped as y
, and updates the aggregate B
to (x._1 + y, x._2 + 1)
. So it increments the second element in the tuple. This effectively puts the sum of elements into the first element of the tuple and the number of elements into the second element of the tuple.
The method combop
then takes the results from each parallel execution thread and combines them. Combination by addition provides the same results as if it were run on the list sequentially.
Using B
as a tuple is likely the confusing piece of this. You can break the problem down into two sub problems to get a better idea of what this is doing. res0
is the first element in the result tuple, and res1
is the second element in the result tuple.
// Sums all elements in parallel.
scala> x.par.aggregate(0)((x, y) => x + y, (x, y) => x + y)
res0: Int = 21
// Counts all elements in parallel.
scala> x.par.aggregate(0)((x, y) => x + 1, (x, y) => x + y)
res1: Int = 6