问题
I expected that simple intermediate stream operations, such as limit()
, have very little overhead. But the difference in throughput between these examples is actually significant:
final long MAX = 5_000_000_000L;
LongStream.rangeClosed(0, MAX)
.count();
// throughput: 1.7 bn values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.count();
// throughput: 780m values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.limit(MAX)
.count();
// throughput: 130m values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.limit(MAX)
.limit(MAX)
.count();
// throughput: 65m values/second
I am curious: What is the reason for the quickly degrading throughput? Is it a consistent pattern with chained stream operations or my test setup? (I did not use JMH so far, just set up a quick experiment with a stopwatch)
回答1:
limit
will result in a slice being made of the stream, with a split iterator (for parallel operation). In one word: inefficient. A large overhead for a no-op here. And that two consecutive limit
calls result in two slices is a shame.
You should take a look at the implementation of IntStream.limit
.
As Streams are still relative new, optimization should come last; when production code exists. Doing limit 3 times seems a bit far-fetched.
回答2:
This is an under implementation in the Stream API (don't know how to call it otherwise).
In the first example, you know the count
without actually counting - there are no filter
(for example) operations that might clear the internal flag called SIZED
. It's actually a bit interesting if you change this and inspect:
System.out.println(
LongStream.rangeClosed(0, Long.MAX_VALUE)
.spliterator()
.hasCharacteristics(Spliterator.SIZED)); // reports false
System.out.println(
LongStream.rangeClosed(0, Long.MAX_VALUE - 1) // -1 here
.spliterator()
.hasCharacteristics(Spliterator.SIZED)); // reports true
And limit
- even if there are no fundamental (AFAIK) limitations, does not introduce SIZED
flag:
System.out.println(LongStream.rangeClosed(0, MAX)
.limit(MAX)
.spliterator()
.hasCharacteristics(Spliterator.SIZED)); // reports false
Since you count everywhere, the fact that internally the Stream API does not know if stream is SIZED
, it just counts; while if the Stream is SIZED
- reporting count would be well, instant.
When you add limit
a few times, you are just making it worse, since it has to limit those limits, every single time.
Things have improved in java-9 for example, for the case:
System.out.println(LongStream.rangeClosed(0, MAX)
.map(x -> {
System.out.println(x);
return x;
})
.count());
In this case map
is not computed at all, since there is no need for it to - no intermediate operation changes the size of the stream.
Theoretically a Stream API might see that you are limit
ing and 1) introduce the SIZED
flag 2) see that you have multiple calls of limit
and just probably take the last one. At the moment this is not done, but this has a very limited scope, how many people would abuse limit
this way? So don't expect any improvements on this part soon.
来源:https://stackoverflow.com/questions/52646345/quickly-degrading-stream-throughput-with-chained-operations