List numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8);
List twoEvenSquares = numbers.stream().filter(n -> {
System.out.println(\"
This is the result of the lazy execution/evaluation of intermediate stream operations.
The chain of operations is lazily evaluated in reverse order going from collect()
to filter()
, values are consumed by each step as soon as they are produced by the previous step.
To describe more clearly what's happening:
collect()
starts the evaluation of the chain.limit()
starts the evaluation of its ancestormap()
starts the evaluation of its ancestorfilter()
starts consuming values from the source stream1
is evaluated, 2
is evaluated and the first value is producedmap()
consumes the first value returned by its ancestor and produce a value toolimit()
consume that value collect()
collect the first valuelimit()
requires another value from the map()
sourcemap()
requires another value from it's ancestorfilter()
resume the evaluation to produce another result and after evaluating 3
and 4
produce the new value 4
map()
consumes it and produce a new valuelimit()
consume the new value and returns itcollect()
collects the last value.From the java.util.stream docs:
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
filter
and map
are intermediate operations. As the doc states:
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
[...]
Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state.
So when you call your terminal operation (i.e collect()
), you can think of something like this (this is really simplified (you'll use the collector to accumulates the pipeline's content, Streams are not iterable, ...) and does not compile but it's just to visualize things):
public List collectToList() {
List list = new ArrayList();
for(Elem e : this) {
if(filter.test(e)) { //here you see the filter println
e = mapping.apply(e); //here you see the mapping println
list.add(e);
if(limit >= list.size())
break;
}
}
return list;
}
Streams are pull-based. Only a terminal operations (like the collect
) will cause items to be consumed.
Conceptually this means that collect
will ask an item from the limit
, limit
from the map
and map
from the filter
, and filter
from the stream.
Schematically the code in your question leads to
collect
limit (0)
map
filter
stream (returns 1)
/filter (false)
filter
stream (returns 2)
/filter (true)
/map (returns 4)
/limit (1)
limit (1)
map
filter
stream (returns 3)
/filter (false)
filter
stream (returns 4)
/filter (true)
/map (returns 16)
/limit (2)
limit (2)
/limit (no more items; limit reached)
/collect
And this conforms to your first printout.
The behavior you noticed is the correct one. In order to find out if a number passes the entire Stream pipeline, you have to run that number through all the pipeline steps.
filtering 1 // 1 doesn't pass the filter
filtering 2 // 2 passes the filter, moves on to map
mapping 2 // 2 passes the map and limit steps and is added to output list
filtering 3 // 3 doesn't pass the filter
filtering 4 // 4 passes the filter, moves on to map
mapping 4 // 4 passes the map and limit steps and is added to output list
now the pipeline can end, since we have two numbers that passed the pipeline.
The Stream
API is not meant to provide guarantees regarding order of the execution of the operations. That’s why you should use side-effect free functions. The “short circuiting” does not change anything about it, it’s only about not performing more operations than necessary (and completing in finite time when possible, even for infinite stream sources). And when you look at your output you’ll find that everything works right. The performed operations match the ones you expected and so does the result.
Only the order doesn’t match and that’s not because of the concept but your wrong assumption about the implementation. But if you think about how an implementation which does not use an intermediate storage has to look like, you will come to the conclusion that it has to be exactly like observed. A Stream
will process each item one after another, filtering, mapping and collecting it before the next one.