Explanation of the aggregate scala function

前端 未结 5 1771
我寻月下人不归
我寻月下人不归 2021-01-31 04:35

I do not get to understand yet the aggregate function:

For example, having:

val x = List(1,2,3,4,5,6)
val y = x.par.aggregate((0, 0))((x, y) => (x._1          


        
5条回答
  •  醉梦人生
    2021-01-31 05:16

    From the documentation:

    def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
    

    Aggregates the results of applying an operator to subsequent elements.

    This is a more general form of fold and reduce. It has similar semantics, but does not require the result to be a supertype of the element type. It traverses the elements in different partitions sequentially, using seqop to update the result, and then applies combop to results from different partitions. The implementation of this operation may operate on an arbitrary number of collection partitions, so combop may be invoked an arbitrary number of times.

    For example, one might want to process some elements and then produce a Set. In this case, seqop would process an element and append it to the list, while combop would concatenate two lists from different partitions together. The initial value z would be an empty set.

    pc.aggregate(Set[Int]())(_ += process(_), _ ++ _)

    Another example is calculating geometric mean from a collection of doubles (one would typically require big doubles for this). B the type of accumulated results z the initial value for the accumulated result of the partition - this will typically be the neutral element for the seqop operator (e.g. Nil for list concatenation or 0 for summation) and may be evaluated more than once seqop an operator used to accumulate results within a partition combop an associative operator used to combine results from different partitions

    In your example B is a Tuple2[Int, Int]. The method seqop then takes a single element from the list, scoped as y, and updates the aggregate B to (x._1 + y, x._2 + 1). So it increments the second element in the tuple. This effectively puts the sum of elements into the first element of the tuple and the number of elements into the second element of the tuple.

    The method combop then takes the results from each parallel execution thread and combines them. Combination by addition provides the same results as if it were run on the list sequentially.

    Using B as a tuple is likely the confusing piece of this. You can break the problem down into two sub problems to get a better idea of what this is doing. res0 is the first element in the result tuple, and res1 is the second element in the result tuple.

    // Sums all elements in parallel.
    scala> x.par.aggregate(0)((x, y) => x + y, (x, y) => x + y)
    res0: Int = 21
    
    // Counts all elements in parallel.    
    scala> x.par.aggregate(0)((x, y) => x + 1, (x, y) => x + y)
    res1: Int = 6
    

提交回复
热议问题