I\'ve been playing with Java 8 Streams - API
and I decided to microbenchmark stream()
and parallelStream()
streams. As expected the
When you are using the unsorted list all tuples are accessed in memory-order. They have been allocated consecutively in RAM. CPUs love accessing memory sequentially because they can speculatively request the next cache line so it will always be present when needed.
When you are sorting the list you put it into random order because your sort keys are randomly generated. This means that the memory accesses to tuple members are unpredictable. The CPU cannot prefetch memory and almost every access to a tuple is a cache miss.
This is a nice example for a specific advantage of GC memory management: data structures which have been allocated together and are used together perform very nicely. They have great locality of reference.
The penalty from cache misses outweighs the saved branch prediction penalty in this case.
This question's accepted answer, answers my question too: Why is processing a sorted array slower than an unsorted array?
When I create the original List
sorted - i.e. it's elements are sequentally in memory, there is no difference in the execution time and it is equal to the unsorted
version when the List
is filled with random numbers.