Java 8 stream operations execution order

后端 未结 5 1529
隐瞒了意图╮
隐瞒了意图╮ 2020-12-28 18:35
List numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8);
List twoEvenSquares = numbers.stream().filter(n -> {
    System.out.println(\"         


        
相关标签:
5条回答
  • 2020-12-28 19:14

    This is the result of the lazy execution/evaluation of intermediate stream operations.

    The chain of operations is lazily evaluated in reverse order going from collect() to filter(), values are consumed by each step as soon as they are produced by the previous step.

    To describe more clearly what's happening:

    1. The only terminal operation collect() starts the evaluation of the chain.
    2. limit() starts the evaluation of its ancestor
    3. map() starts the evaluation of its ancestor
    4. filter() starts consuming values from the source stream
    5. 1 is evaluated, 2 is evaluated and the first value is produced
    6. map() consumes the first value returned by its ancestor and produce a value too
    7. limit() consume that value
    8. collect() collect the first value
    9. limit() requires another value from the map() source
    10. map() requires another value from it's ancestor
    11. filter() resume the evaluation to produce another result and after evaluating 3 and 4 produce the new value 4
    12. map() consumes it and produce a new value
    13. limit() consume the new value and returns it
    14. collect() collects the last value.

    From the java.util.stream docs:

    Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.

    Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

    0 讨论(0)
  • 2020-12-28 19:15

    filter and map are intermediate operations. As the doc states:

    Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

    [...]

    Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state.

    So when you call your terminal operation (i.e collect()), you can think of something like this (this is really simplified (you'll use the collector to accumulates the pipeline's content, Streams are not iterable, ...) and does not compile but it's just to visualize things):

    public List collectToList() {
        List list = new ArrayList();
        for(Elem e : this) {
            if(filter.test(e)) { //here you see the filter println
                e = mapping.apply(e); //here you see the mapping println
                list.add(e);
                if(limit >= list.size())
                    break;
             }
         }
         return list;
     }
    
    0 讨论(0)
  • 2020-12-28 19:19

    Streams are pull-based. Only a terminal operations (like the collect) will cause items to be consumed.

    Conceptually this means that collect will ask an item from the limit, limit from the map and map from the filter, and filter from the stream.

    Schematically the code in your question leads to

    collect
      limit (0)
        map
          filter
            stream (returns 1)
          /filter (false)
          filter
            stream (returns 2)
          /filter (true)
        /map (returns 4)
      /limit (1)
      limit (1)
        map
          filter
            stream (returns 3)
          /filter (false)
          filter
            stream (returns 4)
          /filter (true)
        /map (returns 16)
      /limit (2)
      limit (2)
      /limit (no more items; limit reached)
    /collect
    

    And this conforms to your first printout.

    0 讨论(0)
  • 2020-12-28 19:22

    The behavior you noticed is the correct one. In order to find out if a number passes the entire Stream pipeline, you have to run that number through all the pipeline steps.

    filtering 1 // 1 doesn't pass the filter
    filtering 2 // 2 passes the filter, moves on to map
    mapping 2 // 2 passes the map and limit steps and is added to output list
    filtering 3 // 3 doesn't pass the filter
    filtering 4 // 4 passes the filter, moves on to map 
    mapping 4 // 4 passes the map and limit steps and is added to output list
    

    now the pipeline can end, since we have two numbers that passed the pipeline.

    0 讨论(0)
  • 2020-12-28 19:32

    The Stream API is not meant to provide guarantees regarding order of the execution of the operations. That’s why you should use side-effect free functions. The “short circuiting” does not change anything about it, it’s only about not performing more operations than necessary (and completing in finite time when possible, even for infinite stream sources). And when you look at your output you’ll find that everything works right. The performed operations match the ones you expected and so does the result.

    Only the order doesn’t match and that’s not because of the concept but your wrong assumption about the implementation. But if you think about how an implementation which does not use an intermediate storage has to look like, you will come to the conclusion that it has to be exactly like observed. A Stream will process each item one after another, filtering, mapping and collecting it before the next one.

    0 讨论(0)
提交回复
热议问题