I observed some strange behaviour in one of my Java programs. I have tried to strip the code down as much as possible while still being able to replicate the behaviour. Code in
The complete answer is a combination of k5_ and Tony's answers.
The code that the OP posted omits a warmup loop to trigger HotSpot compilation before doing the benchmark; hence the 10-fold (on my computer) speedup when the print statements are included, combines both the time spent in HotSpot to compile the bytecode to CPU instructions, as well as the actual running of the CPU instructions.
If I add a separate warmup loop before the timing loop, there is only a 2.5-fold speedup with the print statement.
That indicates that both the HotSpot/JIT compilation takes longer when the method is inlined (as Tony explained) as well as that the running of the code takes longer, probably because of worse cache or branch-prediction/pipelining performance, as k5_ showed.
public static void main(String[] args) {
// Added the following warmup loop before the timing loop
for (int i = 0; i < 50000; i++) {
functionA(6, 0);
}
long startTime = System.nanoTime();
for (int i = 0; i < 50000; i++) {
functionA(6, 0);
}
long endTime = System.nanoTime();
System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000.0 / 1000 / 1000);
}