I have a piece of code where it appears, in every test I\'ve run, that function calls have a significant amount of overhead. The code is a tight loop, performing a very simp
A method call is not a problem since hot methods are often inlined. A virtual call is an issue.
In your code the type profiler is fooled by the initialization method Image.random
. When Image.process
is JIT-compiled for the first time, it is optimized for calling random.nextInt()
. So the next invocations of Image.process
will result in the inline-cache miss followed by a non-optimized virtual call to Shader.apply
.
Remove an Image.process
call from the initialization method and JIT will then inline the useful calls to Shader.apply
.
After BlurShader.apply
is inlined you can help JIT to perform Common subexpression elimination optimization by replacing
final int p = s * y + x;
with
final int p = y * s + x;
The latter expression is also met in Image.process
, so JIT will not calculate the same expression twice.
After applying these two changes I've achieved the ideal benchmark score:
Benchmark Mode Samples Mean Mean error Units
s.ShaderBench.testProcessInline thrpt 5 36,483 1,255 ops/s
s.ShaderBench.testProcessLambda thrpt 5 36,323 0,936 ops/s
s.ShaderBench.testProcessProc thrpt 5 36,163 1,421 ops/s