In a recent discussion about how to optimize some code, I was told that breaking code up into lots of small methods can significantly increase performance, because the JIT compi
I've read numerous articles which have stated that smaller methods (as measured in the number of bytes required to represent the method as Java bytecode) are more likely to be eligible for inlining by the JIT (just-in-time compiler) when it compiles hot methods (those which are being run most frequently) into machine code. And they describe how method inlining produces better performance of the resulting machine code. In short: smaller methods give the JIT more options in terms of how to compile bytecode into machine code when it identifies a hot method, and this allows more sophisticated optimizations.
To test this theory, I created a JMH class with two benchmark methods, each containing identical behaviour but factored differently. The first benchmark is named monolithicMethod
(all code in a single method), and the second benchmark is named smallFocusedMethods
and has been refactored so that each major behaviour has been moved out into its own method. The smallFocusedMethods
benchmark look like this:
@Benchmark
public void smallFocusedMethods(TestState state) {
int i = state.value;
if (i < 90) {
actionOne(i, state);
} else {
actionTwo(i, state);
}
}
private void actionOne(int i, TestState state) {
state.sb.append(Integer.toString(i)).append(
": has triggered the first type of action.");
int result = i;
for (int j = 0; j < i; ++j) {
result += j;
}
state.sb.append("Calculation gives result ").append(Integer.toString(
result));
}
private void actionTwo(int i, TestState state) {
state.sb.append(i).append(" has triggered the second type of action.");
int result = i;
for (int j = 0; j < 3; ++j) {
for (int k = 0; k < 3; ++k) {
result *= k * j + i;
}
}
state.sb.append("Calculation gives result ").append(Integer.toString(
result));
}
and you can imagine how monolithicMethod
looks (same code but entirely contained within the one method). The TestState
simply does the work of creating a new StringBuilder
(so that the creation of this object is not counted in the benchmark time) and of choosing a random number between 0 and 100 for each invocation (and this has been deliberately configured so that both benchmarks use exactly the same sequence of random numbers, to avoid the risk of bias).
After running the benchmark with six "forks", each involving five warmups of one second, followed by six iterations of five seconds, the results look like this:
Benchmark Mode Cnt Score Error Units
monolithicMethod thrpt 30 7609784.687 ± 118863.736 ops/s
monolithicMethod:·gc.alloc.rate thrpt 30 1368.296 ± 15.834 MB/sec
monolithicMethod:·gc.alloc.rate.norm thrpt 30 270.328 ± 0.016 B/op
monolithicMethod:·gc.churn.G1_Eden_Space thrpt 30 1357.303 ± 16.951 MB/sec
monolithicMethod:·gc.churn.G1_Eden_Space.norm thrpt 30 268.156 ± 1.264 B/op
monolithicMethod:·gc.churn.G1_Old_Gen thrpt 30 0.186 ± 0.001 MB/sec
monolithicMethod:·gc.churn.G1_Old_Gen.norm thrpt 30 0.037 ± 0.001 B/op
monolithicMethod:·gc.count thrpt 30 2123.000 counts
monolithicMethod:·gc.time thrpt 30 1060.000 ms
smallFocusedMethods thrpt 30 7855677.144 ± 48987.206 ops/s
smallFocusedMethods:·gc.alloc.rate thrpt 30 1404.228 ± 8.831 MB/sec
smallFocusedMethods:·gc.alloc.rate.norm thrpt 30 270.320 ± 0.001 B/op
smallFocusedMethods:·gc.churn.G1_Eden_Space thrpt 30 1393.473 ± 10.493 MB/sec
smallFocusedMethods:·gc.churn.G1_Eden_Space.norm thrpt 30 268.250 ± 1.193 B/op
smallFocusedMethods:·gc.churn.G1_Old_Gen thrpt 30 0.186 ± 0.001 MB/sec
smallFocusedMethods:·gc.churn.G1_Old_Gen.norm thrpt 30 0.036 ± 0.001 B/op
smallFocusedMethods:·gc.count thrpt 30 1986.000 counts
smallFocusedMethods:·gc.time thrpt 30 1011.000 ms
In short, these numbers show that the smallFocusedMethods
approach ran 3.2% faster, and the difference was statistically significant (with 99.9% confidence). And note that the memory usage (based on garbage collection profiling) was not significantly different. So you get faster performance without increased overhead.
I've run a variety of similar benchmarks to test whether small, focused methods give better throughput, and I've found that the improvement is between 3% and 7% in all cases I've tried. But it's likely that the actual gain depends strongly upon the version of the JVM being used, the distribution of executions across your if/else blocks (I've gone for 90% on the first and 10% on the second to exaggerate the heat on the first "action", but I've seen throughput improvements even with a more equal spread across a chain of if/else blocks), and the actual complexity of the work being done by each of the possible actions. So be sure to write your own specific benchmarks if you need to determine what works for your specific application.
My advice is this: write small, focused methods because it makes the code tidier, easier to read, and much easier to override specific behaviours when inheritance is involved. The fact that the JIT is likely to reward you with slightly better performance is a bonus, but tidy code should be your main goal in the majority of cases. Oh, and it's also important to give each method a clear, descriptive name which exactly summarises the responsibility of the method (unlike the terrible names I've used in my benchmark).