What do the Stream reduce() requirements exactly entail?

前端 未结 3 982
无人及你
无人及你 2020-12-06 11:41

When using the reduce() operation on a parallel stream, the OCP exam book states that there are certain principles the reduce() arguments must adhe

相关标签:
3条回答
  • 2020-12-06 11:43

    why isn't the accumulator associative?

    It's not associative since the order of subtraction operations determines the final result.

    If you run a serial Stream, you'll get the expected result of:

    0 - 1 - 2 - 3 - 4 - 5 - 6 = -21
    

    On the other hand, for parallel Streams, the work is split to multiple threads. For example, if reduce is executed in parallel on 6 threads, and then the intermediate results are combined, you can get a different result:

    0 - 1   0 - 2   0 - 3      0 - 4     0 - 5    0 - 6
      -1     -2      -3         -4        -5        -6
    
      -1 - (-2)         -3 - (-4)          -5 - (-6)
          1                 1                  1
               1   -   1
                   0            -     1
    
                            -1
    

    Or, to make a long example short:

    (1 - 2) - 3 = -4
    1 - (2 - 3) =  2
    

    Therefore subtraction is not associative.

    On the other hand, a+b doesn't cause the same problem, since addition is an associative operator (i.e. (a+b)+c == a+(b+c)).

    The problem with the identity example is that when reduce is executed in parallel on multiple threads, "X" is appended to the starts of each intermediate result.

    What exactly would be a proper identity to use then?

    If you change the identity value to "" :

    System.out.println(Arrays.asList("w","o","l","f"))
    .parallelStream()
    .reduce("", String::concat));
    

    you'll get "wolf" instead of "XwXoXlXf".

    0 讨论(0)
  • 2020-12-06 12:00

    While the question has already been answered and accepted, I think it can be answered in a simpler, more practical way.

    If you don't have a valid identity and an associative accumulator/combiner, the result of the reduce operation will depend on:

    1. the Stream content
    2. the number of threads processing the Stream

    Associativity

    Let's try with an example for non-associative accumulator/combiner (basically, we reduce a list of 50 numbers in a sequence and in parallel by varying the number of threads):

    System.out.println("sequential: reduce="+
        IntStream.rangeClosed(1, 50).boxed()
            .reduce(
                0, 
                (a,b)->a-b, 
                (a,b)->a-b));
    for (int n=1; n<6; n++) {
        ForkJoinPool pool = new ForkJoinPool(n);
        final int finalN = n;
        try {
            pool.submit(()->{
                System.out.println(finalN+" threads : reduce="+
                    IntStream.rangeClosed(1, 50).boxed()
                        .parallel()
                        .reduce(
                            0, 
                            (a,b)->a-b, 
                            (a,b)->a-b));
                }).get();
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            } finally {
                pool.shutdown();
            }
        }
    

    This displays the following results (Oracle JDK 10.0.1) :

    sequential: reduce=-1275
    1 threads : reduce=325
    2 threads : reduce=-175
    3 threads : reduce=-25
    4 threads : reduce=75
    5 threads : reduce=-25
    

    This shows that the result depends on the number of threads involved in the reduce calculation.

    Notes:

    • Interestingly, sequential reduce and parallel reduce for one thread do not lead to the same result. I could not find a good explanation.
    • From my experiments, the same Stream content and the same number of threads always leads to the same reduced value when ran several times. I suppose this is because the parallel stream uses a deterministic Spliterator.
    • I would not use Boyarsky&Selikoff OCP8 book example because the stream is too small (1,2,3,4,5,6) and produces (on my machine) the same reduce value of 3 for a ForkJoinPool of 1,2,3,4 or 5 threads.
    • the default number of threads for a parallel stream is the number of availables CPU cores. This is why you may not have the same reduce result on every machine.

    Identity

    For identity, as Eran wrote with the "XwXoXlXf" example, with 4 threads, each thread will start by using the identity as a kind of String prefix. But pay attention : while the OCP book suggests that "" and 0 are valid identity, it depends on the accumulator/combiner functions. For example:

    • 0 is a valid identity for accumulator (a,b)->a+b (because a+0=a)
    • 1 is a valid identity for accumulator (a,b)->a*b (because a*1=a, but 0 is not valid because a*0=0!)
    0 讨论(0)
  • 2020-12-06 12:02

    Let me give two examples. First where the identity is broken:

    int result = Stream.of(1, 2, 3, 4, 5, 6)
            .parallel()
            .reduce(10, (a, b) -> a + b);
    
    System.out.println(result); // 81 on my run
    

    Basically you have broken this rule: The identity value must be an identity for the accumulator function.  This means that for all u, accumulator(identity, u) is equal to u.

    Or to make is simpler, let's see if that rule holds for some random data from our Stream:

     Integer identity = 10;
     BinaryOperator<Integer> combiner = (x, y) -> x + y;
     boolean identityRespected = combiner.apply(identity, 1) == 1;
     System.out.println(identityRespected); // prints false
    

    And a second example:

    /**
     * count letters, adding a bit more all the time
     */
    private static int howMany(List<String> tokens) {
        return tokens.stream()
                .parallel()
                .reduce(0, // identity
                        (i, s) -> { // accumulator
                            return s.length() + i;
                        }, (left, right) -> { // combiner
                            return left + right + left; // notice the extra left here
                        });
    }
    

    And you invoke this with:

    List<String> left = Arrays.asList("aa", "bbb", "cccc", "ffffffffd", "eeeeee");
    List<String> right = Arrays.asList("aa", "bbb", "cccc", "ffffffffd", "eeeeee", "");
    
    System.out.println(howMany(left));  // 38 on my run
    System.out.println(howMany(right)); // 50 on my run
    

    Basically you have broken this rule: Additionally, the combiner function must be compatible with the accumulator function or in code :

    // this must hold!
    // combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)
    
    Integer identity = 0;
    String t = "aa";
    Integer u = 3; // "bbb"
    BiFunction<Integer, String, Integer> accumulator = (Integer i, String s) -> i + s.length();
    BinaryOperator<Integer> combiner = (left, right) -> left + right + left;
    
    int first = accumulator.apply(identity, t); // 2
    int second = combiner.apply(u, first); // 3 + 2 + 3 = 8
    
    Integer shouldBe8 = accumulator.apply(u, t);
    
    System.out.println(shouldBe8 == second); // false
    
    0 讨论(0)
提交回复
热议问题