When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

前端未结

关注

 5  1101

旧时难觅i

I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams.

However the performance with Streams is always turning out to be wors

相关标签:

5条回答

失恋的感觉

2020-12-13 17:49
How can my Java program run fast?

Long story short, Java programs can be accelerated by:
1. Multithreading
2. JIT
Do streams relate to Java program speedup?

Yes!
1. Note Collection.parallelStream() and Stream.parallel() methods for multithreading
2. One can write for cycle that is long enough for JIT to skip. Lambdas are typically small and can be compiled by JIT => there's possibility to gain performance
What is the scenario stream can be faster than for loop?

Let's take a look at jdk/src/share/vm/runtime/globals.hpp
```
develop(intx, HugeMethodLimit,  8000,
        "Don't compile methods larger than this if "
        "+DontCompileHugeMethods")
```
If you have long enough cycle, it won't be compiled by JIT and will run slowly. If you rewrite such a cycle to stream you'll probably use map, filter, flatMap methods that split code to pieces and every piece can be small enough to fit under limit. For sure, writing huge methods has other downsides apart from JIT compilation. This scenario can be considered if, for example, you've got a lot of generated code.

What's about branch prediction?

Of course streams take advantage of branch prediction as every other code does. However branch prediction isn't the technology explicitly used to make streams faster AFAIK.

So, when do I rewrite my loops to streams to achieve the best performance?

Never.

Premature optimization is the root of all evil ©Donald Knuth

Try to optimize algorithm instead. Streams are the interface for functional-like programming, not a tool to speedup loops.
0 讨论(0)
发布评论:

提交评论
- 加载中...

春和景丽

2020-12-13 17:53

Everything is said, but I want to show you how your code should look like using JMH.

@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10, timeUnit = TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Threads(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.NANOSECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {

  private final int totalSize = 32_768;
  private final int filterValue = 1_280;
  private final int loopCount = 10_000;
  // private Random rnd;

  private int[] array;

  @Setup
  public void setup() {
    array = IntStream.range(0, totalSize).toArray();

    // rnd = new Random(0);
    // array = rnd.ints(totalSize).map(i -> i % 2560).toArray();
  }

  @Benchmark
  public long conditionalOperatorTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      for (int c = 0; c < totalSize; ++c) {
        sum += array[c] >= filterValue ? array[c] : 0;
      }
    }
    return sum;
  }

  @Benchmark
  public long branchStatementTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      for (int c = 0; c < totalSize; ++c) {
        if (array[c] >= filterValue) {
          sum += array[c];
        }
      }
    }
    return sum;
  }

  @Benchmark
  public long streamsTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      sum += IntStream.of(array).filter(value -> value >= filterValue).sum();
    }
    return sum;
  }

  @Benchmark
  public long parallelStreamsTime() {
    long sum = 0;
    for (int j = 0; j < loopCount; j++) {
      sum += IntStream.of(array).parallel().filter(value -> value >= filterValue).sum();
    }
    return sum;
  }
}

The results for a sorted array:

Benchmark                            Mode  Cnt           Score           Error  Units
MyBenchmark.branchStatementTime      avgt   30   119833793,881 ±   1345228,723  ns/op
MyBenchmark.conditionalOperatorTime  avgt   30   118146194,368 ±   1748693,962  ns/op
MyBenchmark.parallelStreamsTime      avgt   30   499436897,422 ±   7344346,333  ns/op
MyBenchmark.streamsTime              avgt   30  1126768177,407 ± 198712604,716  ns/op

Results for unsorted data:

Benchmark                            Mode  Cnt           Score           Error  Units
MyBenchmark.branchStatementTime      avgt   30   534932594,083 ±   3622551,550  ns/op
MyBenchmark.conditionalOperatorTime  avgt   30   530641033,317 ±   8849037,036  ns/op
MyBenchmark.parallelStreamsTime      avgt   30   489184423,406 ±   5716369,132  ns/op
MyBenchmark.streamsTime              avgt   30  1232020250,900 ± 185772971,366  ns/op

I only can say that there are many possibilities of JVM optimizations and maybe branch-prediction is also involved. Now it is up to you to interpret the benchmark results.

0 讨论(0)

生来不讨喜

2020-12-13 17:59

I will add my 0.02$ here.

I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams

Branch Prediction is a CPU feature, it has nothing to do with JVM. It is needed to keep CPU pipeline full and ready to do something. Measuring or predicting the branch prediction is extremely hard (unless you actually know the EXACT things that the CPU will do). This will depend on at least the load that the CPU is having right now (that might be a lot more than your program only).

However the performance with Streams is always turning out to be worse than traditional loops

This statement and the previous one are un-related. Yes, streams will be slower for simple examples like yours, up to 30% slower, which is OK. You could measure for a particular case how slower they are or faster via JMH as others have suggested, but that proves only that case, only that load.

At the same time you might be working with Spring/Hibernate/Services, etc etc that do things in milliseconds and your streams in nano-seconds and you worry about the performance? You are questioning the speed of your fastest part of the code? That's of course a theoretical thing.

And about your last point that you tried with sorted and un-sorted arrays and it gives you bad results. This is absolutely no indication of branch prediction or not - you have no idea at which point the prediction happened and if it did unless you can look inside the actual CPU pipelines - which you did not.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2020-12-13 18:03

Java is a high level language saving the programmer from considering low level performance optimization.

Never choose a certain approach for performance reasons unless you have proven that this is a problem in your real application.

Your measurements show some negative effect for streams, but the difference is below observability. Therefore, it's not a Problem. Also, this Test is a "synthetic" situation and the code may behave completely different in a heavy duty production environment. Furthermore, the machine code created from your Java (byte) code by the JIT may change in future Java (maintenance) releases and make your measurements obsolete.

In conclusion: Choose the syntax or approach that most expresses your (the programmer's) intention. Keep to that same approach or syntax throughout the program unless you have a good reason to change.

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-12-13 18:11

I agree to the point that programming with streams is nice and easier for some scenarios but when we're losing out on performance, why do we need to use them?

Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance you need.

Is there something I'm missing out on?

Using parallelStream() is much easier using streams and possibly more efficient as it's hard to write efficient concurrent code.

Which is the scenario in which streams perform equal to loops? Is it only in the case where your function defined takes a lot of time, resulting in a negligible loop performance?

Your benchmark is flawed in the sense that the code hasn't been compiled when it starts. I would do the whole test in a loop as JMH does, or I would use JMH.

In none of the scenario's I could see streams taking advantage of branch-prediction

Branch prediction is a CPU feature not a JVM or streams feature.

0 讨论(0)
发布评论:

提交评论
- 加载中...

When should streams be preferred over traditional loops for best performance? Do streams take advantage of branch-prediction?

How can my Java program run fast?

Do streams relate to Java program speedup?

What is the scenario stream can be faster than for loop?

What's about branch prediction?

So, when do I rewrite my loops to streams to achieve the best performance?

What is the scenario stream can be faster than `for` loop?