Best way performance wise to use limit on stream in case of multithreading

后端未结

关注

 3  1002

I watched a talk by José Paumard on InfoQ : http://www.infoq.com/fr/presentations/jdk8-lambdas-streams-collectors (French)

The thing is I got stuck on this one poin

相关标签:

3条回答

我在风中等你

2021-01-15 18:52
This is an implementation dependent limitation. One thing that developers, concerned about parallel performance, have to understand, is that predictable stream sizes help the parallel performance generally as they allow balanced splitting of the workload.

The issue here is, that the combination of an infinite stream as created via Stream.generate() and limit() does not produce a stream with a predictable size, despite it looks perfectly predictable to us.

We can examine it using the following helper method:
```
static void sizeOf(String op, IntStream stream) {
    final Spliterator.OfInt s = stream.spliterator();
    System.out.printf("%-18s%5d, %d%n", op, s.getExactSizeIfKnown(), s.estimateSize());
}
```
Then
```
sizeOf("randoms with size", ThreadLocalRandom.current().ints(1000));
sizeOf("randoms with limit", ThreadLocalRandom.current().ints().limit(1000));
sizeOf("range", IntStream.range(0, 100));
sizeOf("range map", IntStream.range(0, 100).map(i->i));
sizeOf("range filter", IntStream.range(0, 100).filter(i->true));
sizeOf("range limit", IntStream.range(0, 100).limit(10));
sizeOf("generate limit", IntStream.generate(()->42).limit(10));
```
will print
```
randoms with size  1000, 1000
randoms with limit   -1, 9223372036854775807
range               100, 100
range map           100, 100
range filter         -1, 100
range limit          -1, 100
generate limit       -1, 9223372036854775807
```
So we see, certain sources like Random.ints(size) or IntStream.range(…) produce streams with a predictable size and certain intermediate operations like map are capable of carrying the information as they know that the size is not affected. Others like filter and limit do not propagate the size (as a known exact size).

It’s clear that filter cannot predict the actual number of elements, but it provides the source size as an estimate which is reasonable insofar that that’s the maximum number of elements that can ever pass the filter.

In contrast, the current limit implementation does not provide a size, even if the source has an exact size and we know the predictable size is as simple as min(source size, limit). Instead, it even reports a nonsensical estimate size (the source’s size) despite the fact that it is known that the resulting size will never be higher than the limit. In case of an infinite stream we have the additional obstacle that the Spliterator interface, on which streams are based, doesn’t have a way to report that it is infinite. In these cases, infinite stream + limit returns Long.MAX_VALUE as an estimate which means “I can’t even guess”.

Thus, as a rule of thumb, with the current implementation, a programmer should avoid using limit when there is a way to specify the desired size beforehand at the stream’s source. But since limit also has significant (documented) drawbacks in the case of ordered parallel streams (which doesn’t applies to randoms nor generate), most developers avoid limit anyway.
0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2021-01-15 18:56

JDK docs has good explanation of this behavior, it is ordering constraint that kills performance for parallel processing

Text from doc for limit function - https://docs.oracle.com/javase/8/docs/api/java/util/stream/LongStream.html

While limit() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of maxSize, since limit(n) is constrained to return not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(LongSupplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of limit() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with limit() in parallel pipelines, switching to sequential execution with sequential() may improve performance. Blockquote

0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2021-01-15 19:07
Why is that the second code better?

In the first case you create infinite source, split it for parallel execution to a bunch of tasks each providing an infinite number of elements, then limit the overall size of the result. Even though the source is unordered, this implies some overhead. In this case individual tasks should talk to each other to check when the overall size is reached. If they talk often, this increases the contention. If they talk less, they actually produce more numbers than necessary and then drop some of them. I believe, actual stream API implementation is to talk less between tasks, but this actually leads to produce more numbers than necessary. This also increases memory consumption and activates garbage collector.

In contrast in the second case you create a finite source of known size. When the task is split into subtasks, their sizes are also well-defined and in total they produce exactly the requested number of random numbers without the necessity to talk to each other at all. That's why it's faster.

Is there a better, or at least less costly way to do it?

The biggest problem in your code samples is boxing. If you need 10_000_000 random numbers, it's very bad idea to box each of them and store in the List<Long>: you create tons of unnecessary objects, perform many heap allocations and so on. Replace this with primitive streams:
```
long[] randomNumbers = ThreadLocalRandom.current().longs(10_000_000).parallel().toArray();
```
This would be much much faster (probably an order of magnitude).

Also you may consider new Java-8 SplittableRandom class. It provides roughly the same performance, but the generated random numbers have much higher quality (including passing of DieHarder 3.31.1):
```
long[] randomNumbers = new SplittableRandom().longs(10_000_000).parallel().toArray();
```
0 讨论(0)
发布评论:

提交评论
- 加载中...