问题
Consider this code
Object found = collection.stream()
.filter( s -> myPredicate1(s))
.filter( s -> myPredicate2(s))
.findAny()
Will it process entire stream, and call both myPredicate1
and myPredicate2
for all elements of the collection? Or will as many predicates be called as are needed to actually find the value?
回答1:
Yes it is, as the Stream.findAny() documentation states:
This is a short-circuiting terminal operation.
It's a common misconception that objects in stream are "pushed" towards consuming operation. It's actually the other way around - the consuming operation pulls each element.
For sequential streams only as many predicates will be called as are needed to find matching value. Parallel streams may execute more predicates, but will also stop execution as soon as an element is found.
public class StreamFilterLazyTest {
static int stI = 0;
static class T {
public T() {
super();
this.i = ++stI;
}
int i;
int getI() {
System.err.println("getI: "+i);
return i;
}
}
public static void main(String[] args) {
T[] arr = {new T(), new T(), new T(), new T(), new T(), new T(), new T(), new T(), new T(), new T()};
Optional<T> found = Arrays.stream(arr).filter(t -> t.getI() == 3).findAny();
System.out.println("Found: "+found.get().getI());
}
}
will print:
getI: 1
getI: 2
getI: 3
Found: 3
回答2:
The javadoc for findAny()
states:
"This is a short-circuiting terminal operation."
"The behavior of this operation is explicitly nondeterministic; it is free to select any element in the stream. This is to allow for maximal performance in parallel operations ..."
This means that findAny()
on a sequential stream will only "pull" enough elements to find the first one. On a parallel stream, it could pull more than enough, depending on the implementation.
The package javadoc also states:
"Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as
filter()
does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed."
This means that the filter()
predicates only occur when the findAny()
terminal pulls them.
In short:
Q: Is filter + findAny still a short-circuit operation?
A: Yes.
回答3:
Well it does not matter if sequential or parallel streams are used, they are still going to traverse as many elements as are required to find the first that matches. It might be different if you use findFirst
and you have a Stream made of an ordered collection.
findFirst
in this case has to preserver the order.
In this case, due to parallelism, the second, then third elements might be processed before the first, but still only the first will be returned.
回答4:
Stream#findAny is a short-circuiting terminal operation. it will visit Predicate
s to matching & short-circuited one by one since Stream#filter return a new stream each time.
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.
As @Holger mentioned in comment that it can makes filters be short-circuited as below:
if(predicate1.test(value) && predicate2.test(value)){
....
}
Test
Iterator<Predicate<Integer>> predicates = Stream.<Predicate<Integer>>of(
it -> false,
it -> {
throw new AssertionError("Can't be short-circuited!");
}
).iterator();
Predicate<Integer> expectsToBeShortCircuited = it -> predicates.next().test(it);
Stream.of(1).filter(expectsToBeShortCircuited).filter(expectsToBeShortCircuited)
// |
// |
// here is short-circuited since the stream is empty now
.findAny();
回答5:
You can use peek
to verify this
== Sequential ==
Alpha1 Alpha2 Beta1 Beta2 Gamma1 Gamma2 Dolphin1 Fargo1 Fargo2 Found: Fargo Applications: 9
== Parallel ==
Arnold1 Jim1 Loke1 Alpha1 Mustard1 Lenny1 Mustard2 Mark1 Alpha2 Mark2 Beta1 Beta2 Gamma1 Fargo1 Gamma2 Dolphin1 Fargo2 Found: Fargo Applications: 17
YMMV depending on number of cores etc.
Produced by below
package test.test;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;
public class Snippet {
static AtomicInteger predicateApplications;
public static void main(String arr[]) {
System.out.println("== Sequential == \n");
sequential();
System.out.println(" == Parallel == \n");
parallel();
}
private static void sequential() {
Stream<String> stream = Stream.of("Alpha", "Beta", "Gamma", "Dolphin", "Fargo", "Mustard", "Lenny", "Mark",
"Jim", "Arnold", "Loke");
execute(stream);
}
private static void parallel() {
Stream<String> parallelStream = Stream
.of("Alpha", "Beta", "Gamma", "Dolphin", "Fargo", "Mustard", "Lenny", "Mark", "Jim", "Arnold", "Loke")
.parallel();
execute(parallelStream);
}
private static void execute(Stream<String> stream) {
predicateApplications = new AtomicInteger(0);
Optional<String> findAny = stream.peek(s -> print(s + "1")).filter(s -> s.contains("a"))
.peek(s -> print(s + "2")).filter(s -> s.startsWith("F")).findAny();
String found = findAny.orElse("NONE");
System.out.println("\nFound: " + found);
System.out.println("Applications: " + predicateApplications.get());
}
private static void print(String s) {
System.out.print(s + " ");
predicateApplications.incrementAndGet();
}
}
来源:https://stackoverflow.com/questions/44180155/is-stream-findany-a-short-circuit-operation