Faulty benchmarking. The non exhaustive list of what is wrong:
- No warmup: single shot measurements are almost always wrong;
- Mixing several codepaths in the single method: we probably start compiling the method with the execution data available only for the first loop in the method;
- Sources are predictable: should the loop compile, we can actually predict the result;
- Results are dead-code eliminated: should the loop compile, we can throw the loop it away
Here is how you do it arguably right with jmh:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(10)
@State(Scope.Thread)
public class Longs {
public static final int COUNT = 10;
private Long[] refLongs;
private long[] primLongs;
/*
* Implementation notes:
* - copying the array from the field keeps the constant
* optimizations away, but we implicitly counting the
* costs of arraycopy() in;
* - two additional baseline experiments quantify the
* scale of arraycopy effects (note you can't directly
* subtract the baseline scores from the tests, because
* the code is mixed together;
* - the resulting arrays are always fed back into JMH
* to prevent dead-code elimination
*/
@Setup
public void setup() {
primLongs = new long[COUNT];
for (int i = 0; i < COUNT; i++) {
primLongs[i] = 12l;
}
refLongs = new Long[COUNT];
for (int i = 0; i < COUNT; i++) {
refLongs[i] = 12l;
}
}
@GenerateMicroBenchmark
public long[] prim_baseline() {
long[] d = new long[COUNT];
System.arraycopy(primLongs, 0, d, 0, COUNT);
return d;
}
@GenerateMicroBenchmark
public long[] prim_sort() {
long[] d = new long[COUNT];
System.arraycopy(primLongs, 0, d, 0, COUNT);
Arrays.sort(d);
return d;
}
@GenerateMicroBenchmark
public Long[] ref_baseline() {
Long[] d = new Long[COUNT];
System.arraycopy(refLongs, 0, d, 0, COUNT);
return d;
}
@GenerateMicroBenchmark
public Long[] ref_sort() {
Long[] d = new Long[COUNT];
System.arraycopy(refLongs, 0, d, 0, COUNT);
Arrays.sort(d);
return d;
}
}
...this yields:
Benchmark Mode Samples Mean Mean error Units
o.s.Longs.prim_baseline avgt 30 19.604 0.327 ns/op
o.s.Longs.prim_sort avgt 30 51.217 1.873 ns/op
o.s.Longs.ref_baseline avgt 30 16.935 0.087 ns/op
o.s.Longs.ref_sort avgt 30 25.199 0.430 ns/op
At this point you may start to wonder why sorting Long[]
and sorting long[]
takes different time. The answer lies in the Array.sort()
overloads: OpenJDK sorts primitive and reference arrays via different algos (references with TimSort, primitives with dual-pivot quicksort). Here's the highlight of choosing another algo with -Djava.util.Arrays.useLegacyMergeSort=true
, which falls back to merge sort for references:
Benchmark Mode Samples Mean Mean error Units
o.s.Longs.prim_baseline avgt 30 19.675 0.291 ns/op
o.s.Longs.prim_sort avgt 30 50.882 1.550 ns/op
o.s.Longs.ref_baseline avgt 30 16.742 0.089 ns/op
o.s.Longs.ref_sort avgt 30 64.207 1.047 ns/op
Hope that helps to explain the difference.
The explanation above barely scratch the surface about the performance of sorting. The performance is very different when presented with different source data (including available pre-sorted subsequences, their patterns and run lengths, sizes of the data itself).