In a recent discussion about how to optimize some code, I was told that breaking code up into lots of small methods can significantly increase performance, because the JIT compi
I don't really understand how it works, but based on the link AurA provided, I would guess that the JIT compiler will have to compile less bytecode if the same bits are being reused, rather than having to compile different bytecode that is similar across different methods.
Aside from that, the more you are able to break down your code into pieces of sense, the more reuse you are going to get out of your code and that is something that will allow optimization for the VM running it (you are providing more schema to work with).
However I doubt it will have any good impact if you break your code down without any sense that provides no code reuse.
The Hotspot JIT only inlines methods that are less than a certain (configurable) size. So using smaller methods allows more inlining, which is good.
See the various inlining options on this page.
EDIT
To elaborate a little:
Example (full code to have the same line numbers if you try it)
package javaapplication27;
public class TestInline {
private int count = 0;
public static void main(String[] args) throws Exception {
TestInline t = new TestInline();
int sum = 0;
for (int i = 0; i < 1000000; i++) {
sum += t.m();
}
System.out.println(sum);
}
public int m() {
int i = count;
if (i % 10 == 0) {
i += 1;
} else if (i % 10 == 1) {
i += 2;
} else if (i % 10 == 2) {
i += 3;
}
i += count;
i *= count;
i++;
return i;
}
}
When running this code with the following JVM flags: -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:FreqInlineSize=50 -XX:MaxInlineSize=50 -XX:+PrintInlining
(yes I have used values that prove my case: m
is too big but both the refactored m
and m2
are below the threshold - with other values you might get a different output).
You will see that m()
and main()
get compiled, but m()
does not get inlined:
56 1 javaapplication27.TestInline::m (62 bytes)
57 1 % javaapplication27.TestInline::main @ 12 (53 bytes)
@ 20 javaapplication27.TestInline::m (62 bytes) too big
You can also inspect the generated assembly to confirm that m
is not inlined (I used these JVM flags: -XX:+PrintAssembly -XX:PrintAssemblyOptions=intel
) - it will look like this:
0x0000000002780624: int3 ;*invokevirtual m
; - javaapplication27.TestInline::main@20 (line 10)
If you refactor the code like this (I have extracted the if/else in a separate method):
public int m() {
int i = count;
i = m2(i);
i += count;
i *= count;
i++;
return i;
}
public int m2(int i) {
if (i % 10 == 0) {
i += 1;
} else if (i % 10 == 1) {
i += 2;
} else if (i % 10 == 2) {
i += 3;
}
return i;
}
You will see the following compilation actions:
60 1 javaapplication27.TestInline::m (30 bytes)
60 2 javaapplication27.TestInline::m2 (40 bytes)
@ 7 javaapplication27.TestInline::m2 (40 bytes) inline (hot)
63 1 % javaapplication27.TestInline::main @ 12 (53 bytes)
@ 20 javaapplication27.TestInline::m (30 bytes) inline (hot)
@ 7 javaapplication27.TestInline::m2 (40 bytes) inline (hot)
So m2
gets inlined into m
, which you would expect so we are back to the original scenario. But when main
gets compiled, it actually inlines the whole thing. At the assembly level, it means you won't find any invokevirtual
instructions any more. You will find lines like this:
0x00000000026d0121: add ecx,edi ;*iinc
; - javaapplication27.TestInline::m2@7 (line 33)
; - javaapplication27.TestInline::m@7 (line 24)
; - javaapplication27.TestInline::main@20 (line 10)
where basically common instructions are "mutualised".
Conclusion
I am not saying that this example is representative but it seems to prove a few points:
And finally: if a portion of your code is really critical for performance that these considerations matter, you should examine the JIT output to fine tune your code and importantly profile before and after.
If you take the exact same code and just break them up into lots of small methods, that is not going to help JIT at all.
A better way to put it is that modern HotSpot JVMs do not penalize you for writing a lot of small methods. They do get aggressively inlined, so at runtime you do not really pay the cost of function calls. This is true even for invokevirtual calls, such as the one that calls an interface method.
I did a blog post several years ago that describes how you can see JVM is inlining methods. The technique is still applicable to modern JVMs. I also found it useful to look at the discussions related to invokedynamic, where how the modern HotSpot JVMs compiles Java byte code gets discussed extensively.
I've read numerous articles which have stated that smaller methods (as measured in the number of bytes required to represent the method as Java bytecode) are more likely to be eligible for inlining by the JIT (just-in-time compiler) when it compiles hot methods (those which are being run most frequently) into machine code. And they describe how method inlining produces better performance of the resulting machine code. In short: smaller methods give the JIT more options in terms of how to compile bytecode into machine code when it identifies a hot method, and this allows more sophisticated optimizations.
To test this theory, I created a JMH class with two benchmark methods, each containing identical behaviour but factored differently. The first benchmark is named monolithicMethod
(all code in a single method), and the second benchmark is named smallFocusedMethods
and has been refactored so that each major behaviour has been moved out into its own method. The smallFocusedMethods
benchmark look like this:
@Benchmark
public void smallFocusedMethods(TestState state) {
int i = state.value;
if (i < 90) {
actionOne(i, state);
} else {
actionTwo(i, state);
}
}
private void actionOne(int i, TestState state) {
state.sb.append(Integer.toString(i)).append(
": has triggered the first type of action.");
int result = i;
for (int j = 0; j < i; ++j) {
result += j;
}
state.sb.append("Calculation gives result ").append(Integer.toString(
result));
}
private void actionTwo(int i, TestState state) {
state.sb.append(i).append(" has triggered the second type of action.");
int result = i;
for (int j = 0; j < 3; ++j) {
for (int k = 0; k < 3; ++k) {
result *= k * j + i;
}
}
state.sb.append("Calculation gives result ").append(Integer.toString(
result));
}
and you can imagine how monolithicMethod
looks (same code but entirely contained within the one method). The TestState
simply does the work of creating a new StringBuilder
(so that the creation of this object is not counted in the benchmark time) and of choosing a random number between 0 and 100 for each invocation (and this has been deliberately configured so that both benchmarks use exactly the same sequence of random numbers, to avoid the risk of bias).
After running the benchmark with six "forks", each involving five warmups of one second, followed by six iterations of five seconds, the results look like this:
Benchmark Mode Cnt Score Error Units
monolithicMethod thrpt 30 7609784.687 ± 118863.736 ops/s
monolithicMethod:·gc.alloc.rate thrpt 30 1368.296 ± 15.834 MB/sec
monolithicMethod:·gc.alloc.rate.norm thrpt 30 270.328 ± 0.016 B/op
monolithicMethod:·gc.churn.G1_Eden_Space thrpt 30 1357.303 ± 16.951 MB/sec
monolithicMethod:·gc.churn.G1_Eden_Space.norm thrpt 30 268.156 ± 1.264 B/op
monolithicMethod:·gc.churn.G1_Old_Gen thrpt 30 0.186 ± 0.001 MB/sec
monolithicMethod:·gc.churn.G1_Old_Gen.norm thrpt 30 0.037 ± 0.001 B/op
monolithicMethod:·gc.count thrpt 30 2123.000 counts
monolithicMethod:·gc.time thrpt 30 1060.000 ms
smallFocusedMethods thrpt 30 7855677.144 ± 48987.206 ops/s
smallFocusedMethods:·gc.alloc.rate thrpt 30 1404.228 ± 8.831 MB/sec
smallFocusedMethods:·gc.alloc.rate.norm thrpt 30 270.320 ± 0.001 B/op
smallFocusedMethods:·gc.churn.G1_Eden_Space thrpt 30 1393.473 ± 10.493 MB/sec
smallFocusedMethods:·gc.churn.G1_Eden_Space.norm thrpt 30 268.250 ± 1.193 B/op
smallFocusedMethods:·gc.churn.G1_Old_Gen thrpt 30 0.186 ± 0.001 MB/sec
smallFocusedMethods:·gc.churn.G1_Old_Gen.norm thrpt 30 0.036 ± 0.001 B/op
smallFocusedMethods:·gc.count thrpt 30 1986.000 counts
smallFocusedMethods:·gc.time thrpt 30 1011.000 ms
In short, these numbers show that the smallFocusedMethods
approach ran 3.2% faster, and the difference was statistically significant (with 99.9% confidence). And note that the memory usage (based on garbage collection profiling) was not significantly different. So you get faster performance without increased overhead.
I've run a variety of similar benchmarks to test whether small, focused methods give better throughput, and I've found that the improvement is between 3% and 7% in all cases I've tried. But it's likely that the actual gain depends strongly upon the version of the JVM being used, the distribution of executions across your if/else blocks (I've gone for 90% on the first and 10% on the second to exaggerate the heat on the first "action", but I've seen throughput improvements even with a more equal spread across a chain of if/else blocks), and the actual complexity of the work being done by each of the possible actions. So be sure to write your own specific benchmarks if you need to determine what works for your specific application.
My advice is this: write small, focused methods because it makes the code tidier, easier to read, and much easier to override specific behaviours when inheritance is involved. The fact that the JIT is likely to reward you with slightly better performance is a bonus, but tidy code should be your main goal in the majority of cases. Oh, and it's also important to give each method a clear, descriptive name which exactly summarises the responsibility of the method (unlike the terrible names I've used in my benchmark).