Fold and foldLeft method difference

后端 未结 7 802
终归单人心
终归单人心 2021-01-31 01:51

I am not sure what is the difference between fold and foldLeft in Scala.

The question Difference between fold and foldLeft or foldRight? has an

相关标签:
7条回答
  • 2021-01-31 02:05

    I am not familiar with Scala, but Scala's collection library has a similar design to Haskell's. This answer is based on Haskell and is probably accurate for Scala as well.

    Because foldLeft processes its inputs from left to right, it can have different input and output types. On the other hand, fold can process its inputs in various orders and so the inputs and output must all have the same type. This is easiest to see by expanding out the fold expressions. foldLeft operates in a specific order:

    Array("1","2","3").foldLeft(0)(_ + _.toInt)
    = ((0 + "1".toInt) + "2".toInt) + "3".toInt
    

    Note that array elements are never used as the first parameter to the combining function. They always appear on the right of the +.

    fold does not guarantee a specific order. It could do various things, such as:

    Array("1","2","3").fold(0)(_ + _.toInt)
    =  ((0 + "1".toInt) + "2".toInt) + "3".toInt
    or (0 + "1".toInt) + ("2" + "3".toInt).toInt
    or "1" + ("2" + ("3" + 0.toInt).toInt).toInt
    

    Array elements can appear in either parameter of the combining function. But your combining function expects its first argument to be an int. If you don't respect that constraint, you end up adding strings to ints! This error is caught by the type system.

    The neutral element may be introduced multiple times because, generally, a parallel fold is implemented by splitting up the input and executing multiple sequential folds. A sequential fold introduces the neutral element once. Imagine one particular execution of Array(1,2,3,4).fold(0)(_ + _) where the array is split into two separate arrays, and these are folded sequentially in two threads. (Of course, the real fold function does not spit the array into multiple arrays.) One thread executes Array(1,2).fold(0)(_ + _), computing 0 + 1 + 2. The other thread executes Array(3,4).fold(0)(_ + _), computing 0 + 3 + 4. Finally, the partial sums from the two threads are added together. Note that the neutral element, 0, appears twice.

    0 讨论(0)
  • 2021-01-31 02:07

    The error. You are getting a compile-time error because the signature of fold only allows folding values of type which is the supertype of the type of the values in the collection, and the only supertype of String (your collection type) and Int (the type of your provided zero element) is Any. So, the type of the fold result is inferred to be Any - and Any does not have a method toInt.

    Note that the two versions of fold have different signatures:

    fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1
    
    foldLeft[B](z: B)(f: (B, A) => B): B
    

    Why do they have different signatures? This is because fold could be implemented in parallel, as is the case with parallel collections. When multiple processors fold over the values in the collections, each one of the processors takes a subset of elements of type A and produces the folded value of type A1, by consecutively applying op. The results produced by those processors must be combined together into a final folding value - this is done using the op function, which does exactly that.

    Now, note that this cannot be done using the f in foldLeft, because each of the processors produces a folded value of type B. Several values of type B cannot be combined using f, because f only combines value B with another value of type A - there is no correspondance between types A and B.

    Example. In your example, assume the 1st processor takes elements "1", "2" and the 2nd takes element "3". The first one will produce the folded value 3, and the second will produce another folded value 3. Now they have to combine their results to get the final folded value - this is impossible, because the closure _ + _.toInt only knows how to combine an Int and String, and not 2 Int values.

    For situations where these types differ, use aggregate, in which you have to define how to combine two values of type B:

    def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B
    

    The combop above defines how to do the final step when the fold result and the elements in the collection have different types.

    The neutral element. As described above, multiple processors may fold over subsets of elements in the collection. Each one of them will start its folded value by adding the neutral element.

    In the following example:

    List(1, 2, 3).foldLeft(4)(_ + _)
    

    always returns 10 = 4 + 1 + 2 + 3.

    However, 4 should not be used with fold, as it is not a neutral element:

    List(1, 2, 3).fold(4)(_ + _)
    

    The above may return (4 + 1 + 2) + (4 + 3) = 14 or (4 + 1) + (4 + 2) + (4 + 3) = 18. If you don't use the neutral element for the fold, the results are nondeterministic. In the same way, you can use the Nil as a neutral element, but not a non-empty list.

    0 讨论(0)
  • 2021-01-31 02:13

    NOTE: I could be completely wrong here. My scala is less than perfect.

    I think that the difference is in the signature of the methods:

    def fold[A1 >: A](z: A1)(op: (A1, A1) ⇒ A1): A1
    

    vs

    def foldLeft[B](z: B)(op: (B, T) ⇒ B): B
    

    In short, fold is defined as operating on some type A1 which is a supertype of the array's type, which for your string array the compiler defines as "Any" (probably because it needs a type that can store your String or an int- notice that the combiner method passed to fold Fold takes two parameters of the same type?) That's also what the documentation means when it talks about z- the implementation of Fold could be such that it combines your inputs in parallel, for instance:

    "1" + "2" --\
                 --> 3 + 3 -> 6
    "3" + *z* --/
    

    On the other hand, foldLeft operates on type B (unconstrained) and only asks that you provide a combiner method that takes a parameter of type B and another of your array's type (String, in your case), and produces a B.

    0 讨论(0)
  • 2021-01-31 02:13

    It has been mentioned, but without example: If you want to allow parallelism with different data types for output and input, you could use aggregate:

    Array("1","2","3").aggregate(0)(_ + _.toInt, _ + _)
    

    The first function gets called first. Its results are then reduced with the second function. See Explanation of the aggregate scala function.

    0 讨论(0)
  • 2021-01-31 02:15

    As another answer points out, the fold method is primarily there to support a parallel fold. You can see this as follows. First we can define a kind of wrapper for integers that allows us to keep track of the operations that have been performed on its instances.

    case class TrackInt(v: Int) {
      val log = collection.mutable.Buffer.empty[Int]
      def plus(that: TrackInt) = {
        this.log += that.v
        that.log += this.v
        new TrackInt(this.v + that.v)
      }
    }
    

    Next we can create a parallel collection of these things and an identity element:

    val xs = (1 to 10).map(TrackInt(_)).par
    val zero = TrackInt(0)
    

    First we'll try foldLeft:

    scala> xs.foldLeft(zero)(_ plus _)
    res0: TrackInt = TrackInt(55)
    
    scala> zero.log
    res1: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1)
    

    So our zero value is only used once, as we'd expect, since foldLeft performs a sequential fold. Next we can clear the log and try fold:

    scala> zero.log.clear()
    
    scala> xs.fold(zero)(_ plus _)
    res2: TrackInt = TrackInt(55)
    
    scala> zero.log
    res3: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1, 6, 2, 7, 8)
    

    So we can see that the fold has been parallelized in such a way that the zero value is used multiple times. If we ran this again we'd be likely to see different values in the log.

    0 讨论(0)
  • 2021-01-31 02:25

    As defined by Scala, foldLeft is a linear operation while fold is allowed to be a tree operation. For example:

    List(1,2,3,4,5).foldLeft(0)(_ + _)
    // This is the only valid order of operations
    0+1 = 1
          1+2 = 3
                3+3 = 6
                      6+4 = 10
                            10 + 5 = 15
                                     15  // done
    
    List(1,2,3,4,5).fold(0)(_ + _)
    // This is valid
    0+1 = 1             0+3 = 3           0+5 = 5
          1+2 = 3             3+4 = 7           5
                3         +         7=10        5
                                      10    +   5 = 15
                                                    15  // done
    

    In order to allow arbitrary tree decompositions of a sequential list, you must have a zero that doesn't do anything (so you can add it wherever you need it in the tree) and you must create the same sort of thing that you take as your binary arguments so the types don't change on you depending on how you decompose the tree.

    (Being able to evaluate as a tree is nice for parallelization. If you want to be able to transform your output time as you go, you need both a combination operator and a standard start-value-transforms-sequence-element-to-desired-type function just like foldLeft has. Scala has this and calls it aggregate, but in some ways this is more like foldLeft than fold is.)

    0 讨论(0)
提交回复
热议问题