Why does Java switch on contiguous ints appear to run faster with added cases?

后端 未结 4 1852
南旧
南旧 2020-11-28 00:24

I am working on some Java code which needs to be highly optimized as it will run in hot functions that are invoked at many points in my main program logic. Part of this code

相关标签:
4条回答
  • 2020-11-28 00:56

    The answer lies in the bytecode:

    SwitchTest10.java

    public class SwitchTest10 {
    
        public static void main(String[] args) {
            int n = 0;
    
            switcher(n);
        }
    
        public static void switcher(int n) {
            switch(n) {
                case 0: System.out.println(0);
                        break;
    
                case 1: System.out.println(1);
                        break;
    
                case 2: System.out.println(2);
                        break;
    
                case 3: System.out.println(3);
                        break;
    
                case 4: System.out.println(4);
                        break;
    
                case 5: System.out.println(5);
                        break;
    
                case 6: System.out.println(6);
                        break;
    
                case 7: System.out.println(7);
                        break;
    
                case 8: System.out.println(8);
                        break;
    
                case 9: System.out.println(9);
                        break;
    
                case 10: System.out.println(10);
                        break;
    
                default: System.out.println("test");
            }
        }       
    }
    

    Corresponding bytecode; only relevant parts shown:

    public static void switcher(int);
      Code:
       0:   iload_0
       1:   tableswitch{ //0 to 10
            0: 60;
            1: 70;
            2: 80;
            3: 90;
            4: 100;
            5: 110;
            6: 120;
            7: 131;
            8: 142;
            9: 153;
            10: 164;
            default: 175 }
    

    SwitchTest22.java:

    public class SwitchTest22 {
    
        public static void main(String[] args) {
            int n = 0;
    
            switcher(n);
        }
    
        public static void switcher(int n) {
            switch(n) {
                case 0: System.out.println(0);
                        break;
    
                case 1: System.out.println(1);
                        break;
    
                case 2: System.out.println(2);
                        break;
    
                case 3: System.out.println(3);
                        break;
    
                case 4: System.out.println(4);
                        break;
    
                case 5: System.out.println(5);
                        break;
    
                case 6: System.out.println(6);
                        break;
    
                case 7: System.out.println(7);
                        break;
    
                case 8: System.out.println(8);
                        break;
    
                case 9: System.out.println(9);
                        break;
    
                case 100: System.out.println(10);
                        break;
    
                case 110: System.out.println(10);
                        break;
                case 120: System.out.println(10);
                        break;
                case 130: System.out.println(10);
                        break;
                case 140: System.out.println(10);
                        break;
                case 150: System.out.println(10);
                        break;
                case 160: System.out.println(10);
                        break;
                case 170: System.out.println(10);
                        break;
                case 180: System.out.println(10);
                        break;
                case 190: System.out.println(10);
                        break;
                case 200: System.out.println(10);
                        break;
                case 210: System.out.println(10);
                        break;
    
                case 220: System.out.println(10);
                        break;
    
                default: System.out.println("test");
            }
        }       
    }
    

    Corresponding bytecode; again, only relevant parts shown:

    public static void switcher(int);
      Code:
       0:   iload_0
       1:   lookupswitch{ //23
            0: 196;
            1: 206;
            2: 216;
            3: 226;
            4: 236;
            5: 246;
            6: 256;
            7: 267;
            8: 278;
            9: 289;
            100: 300;
            110: 311;
            120: 322;
            130: 333;
            140: 344;
            150: 355;
            160: 366;
            170: 377;
            180: 388;
            190: 399;
            200: 410;
            210: 421;
            220: 432;
            default: 443 }
    

    In the first case, with narrow ranges, the compiled bytecode uses a tableswitch. In the second case, the compiled bytecode uses a lookupswitch.

    In tableswitch, the integer value on the top of the stack is used to index into the table, to find the branch/jump target. This jump/branch is then performed immediately. Hence, this is an O(1) operation.

    A lookupswitch is more complicated. In this case, the integer value needs to be compared against all the keys in the table until the correct key is found. After the key is found, the branch/jump target (that this key is mapped to) is used for the jump. The table that is used in lookupswitch is sorted and a binary-search algorithm can be used to find the correct key. Performance for a binary search is O(log n), and the entire process is also O(log n), because the jump is still O(1). So the reason the performance is lower in the case of sparse ranges is that the correct key must first be looked up because you cannot index into the table directly.

    If there are sparse values and you only had a tableswitch to use, table would essentially contain dummy entries that point to the default option. For example, assuming that the last entry in SwitchTest10.java was 21 instead of 10, you get:

    public static void switcher(int);
      Code:
       0:   iload_0
       1:   tableswitch{ //0 to 21
            0: 104;
            1: 114;
            2: 124;
            3: 134;
            4: 144;
            5: 154;
            6: 164;
            7: 175;
            8: 186;
            9: 197;
            10: 219;
            11: 219;
            12: 219;
            13: 219;
            14: 219;
            15: 219;
            16: 219;
            17: 219;
            18: 219;
            19: 219;
            20: 219;
            21: 208;
            default: 219 }
    

    So the compiler basically creates this huge table containing dummy entries between the gaps, pointing to the branch target of the default instruction. Even if there isn't a default, it will contain entries pointing to the instruction after the switch block. I did some basic tests, and I found that if the gap between the last index and the previous one (9) is greater than 35, it uses a lookupswitch instead of a tableswitch.

    The behavior of the switch statement is defined in Java Virtual Machine Specification (§3.10):

    Where the cases of the switch are sparse, the table representation of the tableswitch instruction becomes inefficient in terms of space. The lookupswitch instruction may be used instead. The lookupswitch instruction pairs int keys (the values of the case labels) with target offsets in a table. When a lookupswitch instruction is executed, the value of the expression of the switch is compared against the keys in the table. If one of the keys matches the value of the expression, execution continues at the associated target offset. If no key matches, execution continues at the default target. [...]

    0 讨论(0)
  • 2020-11-28 00:57

    Switch - case is faster if the case values are placed in a narrow range Eg.

    case 1:
    case 2:
    case 3:
    ..
    ..
    case n:
    

    Because, in this case the compiler can avoid performing a comparison for every case leg in the switch statement. The compiler make a jump table which contains addresses of the actions to be taken on different legs. The value on which the switch is being performed is manipulated to convert it into an index in to the jump table. In this implementation , the time taken in the switch statement is much less than the time taken in an equivalent if-else-if statement cascade. Also the time taken in the switch statement is independent of the number of case legs in the switch statement.

    As stated in wikipedia about switch statement in Compilation section.

    If the range of input values is identifiably 'small' and has only a few gaps, some compilers that incorporate an optimizer may actually implement the switch statement as a branch table or an array of indexed function pointers instead of a lengthy series of conditional instructions. This allows the switch statement to determine instantly what branch to execute without having to go through a list of comparisons.

    0 讨论(0)
  • 2020-11-28 01:11

    As pointed out by the other answer, because the case values are contiguous (as opposed to sparse), the generated bytecode for your various tests uses a switch table (bytecode instruction tableswitch).

    However, once the JIT starts its job and compiles the bytecode into assembly, the tableswitch instruction does not always result in an array of pointers: sometimes the switch table is transformed into what looks like a lookupswitch (similar to an if/else if structure).

    Decompiling the assembly generated by the JIT (hotspot JDK 1.7) shows that it uses a succession of if/else if when there are 17 cases or less, an array of pointers when there are more than 18 (more efficient).

    The reason why this magic number of 18 is used seems to come down to the default value of the MinJumpTableSize JVM flag (around line 352 in the code).

    I have raised the issue on the hotspot compiler list and it seems to be a legacy of past testing. Note that this default value has been removed in JDK 8 after more benchmarking was performed.

    Finally, when the method becomes too long (> 25 cases in my tests), it is in not inlined any longer with the default JVM settings - that is the likeliest cause for the drop in performance at that point.


    With 5 cases, the decompiled code looks like this (notice the cmp/je/jg/jmp instructions, the assembly for if/goto):

    [Verified Entry Point]
      # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
      # parm0:    xmm0:xmm0   = double
      # parm1:    rdx       = int
      #           [sp+0x20]  (sp of caller)
      0x00000000024f0160: mov    DWORD PTR [rsp-0x6000],eax
                                                    ;   {no_reloc}
      0x00000000024f0167: push   rbp
      0x00000000024f0168: sub    rsp,0x10           ;*synchronization entry
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
      0x00000000024f016c: cmp    edx,0x3
      0x00000000024f016f: je     0x00000000024f01c3
      0x00000000024f0171: cmp    edx,0x3
      0x00000000024f0174: jg     0x00000000024f01a5
      0x00000000024f0176: cmp    edx,0x1
      0x00000000024f0179: je     0x00000000024f019b
      0x00000000024f017b: cmp    edx,0x1
      0x00000000024f017e: jg     0x00000000024f0191
      0x00000000024f0180: test   edx,edx
      0x00000000024f0182: je     0x00000000024f01cb
      0x00000000024f0184: mov    ebp,edx
      0x00000000024f0186: mov    edx,0x17
      0x00000000024f018b: call   0x00000000024c90a0  ; OopMap{off=48}
                                                    ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83)
                                                    ;   {runtime_call}
      0x00000000024f0190: int3                      ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@72 (line 83)
      0x00000000024f0191: mulsd  xmm0,QWORD PTR [rip+0xffffffffffffffa7]        # 0x00000000024f0140
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@52 (line 62)
                                                    ;   {section_word}
      0x00000000024f0199: jmp    0x00000000024f01cb
      0x00000000024f019b: mulsd  xmm0,QWORD PTR [rip+0xffffffffffffff8d]        # 0x00000000024f0130
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@46 (line 60)
                                                    ;   {section_word}
      0x00000000024f01a3: jmp    0x00000000024f01cb
      0x00000000024f01a5: cmp    edx,0x5
      0x00000000024f01a8: je     0x00000000024f01b9
      0x00000000024f01aa: cmp    edx,0x5
      0x00000000024f01ad: jg     0x00000000024f0184  ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
      0x00000000024f01af: mulsd  xmm0,QWORD PTR [rip+0xffffffffffffff81]        # 0x00000000024f0138
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@64 (line 66)
                                                    ;   {section_word}
      0x00000000024f01b7: jmp    0x00000000024f01cb
      0x00000000024f01b9: mulsd  xmm0,QWORD PTR [rip+0xffffffffffffff67]        # 0x00000000024f0128
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@70 (line 68)
                                                    ;   {section_word}
      0x00000000024f01c1: jmp    0x00000000024f01cb
      0x00000000024f01c3: mulsd  xmm0,QWORD PTR [rip+0xffffffffffffff55]        # 0x00000000024f0120
                                                    ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
                                                    ;   {section_word}
      0x00000000024f01cb: add    rsp,0x10
      0x00000000024f01cf: pop    rbp
      0x00000000024f01d0: test   DWORD PTR [rip+0xfffffffffdf3fe2a],eax        # 0x0000000000430000
                                                    ;   {poll_return}
      0x00000000024f01d6: ret    
    

    With 18 cases, the assembly looks like this (notice the array of pointers which is used and suppresses the need for all the comparisons: jmp QWORD PTR [r8+r10*1] jumps directly to the right multiplication) - that is the likely reason for the performance improvement:

    [Verified Entry Point]
      # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
      # parm0:    xmm0:xmm0   = double
      # parm1:    rdx       = int
      #           [sp+0x20]  (sp of caller)
      0x000000000287fe20: mov    DWORD PTR [rsp-0x6000],eax
                                                    ;   {no_reloc}
      0x000000000287fe27: push   rbp
      0x000000000287fe28: sub    rsp,0x10           ;*synchronization entry
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
      0x000000000287fe2c: cmp    edx,0x13
      0x000000000287fe2f: jae    0x000000000287fe46
      0x000000000287fe31: movsxd r10,edx
      0x000000000287fe34: shl    r10,0x3
      0x000000000287fe38: movabs r8,0x287fd70       ;   {section_word}
      0x000000000287fe42: jmp    QWORD PTR [r8+r10*1]  ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
      0x000000000287fe46: mov    ebp,edx
      0x000000000287fe48: mov    edx,0x31
      0x000000000287fe4d: xchg   ax,ax
      0x000000000287fe4f: call   0x00000000028590a0  ; OopMap{off=52}
                                                    ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96)
                                                    ;   {runtime_call}
      0x000000000287fe54: int3                      ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@202 (line 96)
      0x000000000287fe55: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe8b]        # 0x000000000287fce8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@194 (line 92)
                                                    ;   {section_word}
      0x000000000287fe5d: jmp    0x000000000287ff16
      0x000000000287fe62: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe86]        # 0x000000000287fcf0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@188 (line 90)
                                                    ;   {section_word}
      0x000000000287fe6a: jmp    0x000000000287ff16
      0x000000000287fe6f: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe81]        # 0x000000000287fcf8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@182 (line 88)
                                                    ;   {section_word}
      0x000000000287fe77: jmp    0x000000000287ff16
      0x000000000287fe7c: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe7c]        # 0x000000000287fd00
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@176 (line 86)
                                                    ;   {section_word}
      0x000000000287fe84: jmp    0x000000000287ff16
      0x000000000287fe89: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe77]        # 0x000000000287fd08
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@170 (line 84)
                                                    ;   {section_word}
      0x000000000287fe91: jmp    0x000000000287ff16
      0x000000000287fe96: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe72]        # 0x000000000287fd10
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@164 (line 82)
                                                    ;   {section_word}
      0x000000000287fe9e: jmp    0x000000000287ff16
      0x000000000287fea0: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe70]        # 0x000000000287fd18
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@158 (line 80)
                                                    ;   {section_word}
      0x000000000287fea8: jmp    0x000000000287ff16
      0x000000000287feaa: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe6e]        # 0x000000000287fd20
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@152 (line 78)
                                                    ;   {section_word}
      0x000000000287feb2: jmp    0x000000000287ff16
      0x000000000287feb4: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe24]        # 0x000000000287fce0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@146 (line 76)
                                                    ;   {section_word}
      0x000000000287febc: jmp    0x000000000287ff16
      0x000000000287febe: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe6a]        # 0x000000000287fd30
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@140 (line 74)
                                                    ;   {section_word}
      0x000000000287fec6: jmp    0x000000000287ff16
      0x000000000287fec8: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe68]        # 0x000000000287fd38
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@134 (line 72)
                                                    ;   {section_word}
      0x000000000287fed0: jmp    0x000000000287ff16
      0x000000000287fed2: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe66]        # 0x000000000287fd40
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@128 (line 70)
                                                    ;   {section_word}
      0x000000000287feda: jmp    0x000000000287ff16
      0x000000000287fedc: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe64]        # 0x000000000287fd48
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@122 (line 68)
                                                    ;   {section_word}
      0x000000000287fee4: jmp    0x000000000287ff16
      0x000000000287fee6: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe62]        # 0x000000000287fd50
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@116 (line 66)
                                                    ;   {section_word}
      0x000000000287feee: jmp    0x000000000287ff16
      0x000000000287fef0: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe60]        # 0x000000000287fd58
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@110 (line 64)
                                                    ;   {section_word}
      0x000000000287fef8: jmp    0x000000000287ff16
      0x000000000287fefa: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe5e]        # 0x000000000287fd60
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@104 (line 62)
                                                    ;   {section_word}
      0x000000000287ff02: jmp    0x000000000287ff16
      0x000000000287ff04: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe5c]        # 0x000000000287fd68
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@98 (line 60)
                                                    ;   {section_word}
      0x000000000287ff0c: jmp    0x000000000287ff16
      0x000000000287ff0e: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe12]        # 0x000000000287fd28
                                                    ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
                                                    ;   {section_word}
      0x000000000287ff16: add    rsp,0x10
      0x000000000287ff1a: pop    rbp
      0x000000000287ff1b: test   DWORD PTR [rip+0xfffffffffd9b00df],eax        # 0x0000000000230000
                                                    ;   {poll_return}
      0x000000000287ff21: ret    
    

    And finally the assembly with 30 cases (below) looks similar to 18 cases, except for the additional movapd xmm0,xmm1 that appears towards the middle of the code, as spotted by @cHao - however the likeliest reason for the drop in performance is that the method is too long to be inlined with the default JVM settings:

    [Verified Entry Point]
      # {method} 'multiplyByPowerOfTen' '(DI)D' in 'javaapplication4/Test1'
      # parm0:    xmm0:xmm0   = double
      # parm1:    rdx       = int
      #           [sp+0x20]  (sp of caller)
      0x0000000002524560: mov    DWORD PTR [rsp-0x6000],eax
                                                    ;   {no_reloc}
      0x0000000002524567: push   rbp
      0x0000000002524568: sub    rsp,0x10           ;*synchronization entry
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@-1 (line 56)
      0x000000000252456c: movapd xmm1,xmm0
      0x0000000002524570: cmp    edx,0x1f
      0x0000000002524573: jae    0x0000000002524592  ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
      0x0000000002524575: movsxd r10,edx
      0x0000000002524578: shl    r10,0x3
      0x000000000252457c: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe3c]        # 0x00000000025243c0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@364 (line 118)
                                                    ;   {section_word}
      0x0000000002524584: movabs r8,0x2524450       ;   {section_word}
      0x000000000252458e: jmp    QWORD PTR [r8+r10*1]  ;*tableswitch
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@1 (line 56)
      0x0000000002524592: mov    ebp,edx
      0x0000000002524594: mov    edx,0x31
      0x0000000002524599: xchg   ax,ax
      0x000000000252459b: call   0x00000000024f90a0  ; OopMap{off=64}
                                                    ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120)
                                                    ;   {runtime_call}
      0x00000000025245a0: int3                      ;*new  ; - javaapplication4.Test1::multiplyByPowerOfTen@370 (line 120)
      0x00000000025245a1: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe27]        # 0x00000000025243d0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@358 (line 116)
                                                    ;   {section_word}
      0x00000000025245a9: jmp    0x0000000002524744
      0x00000000025245ae: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe22]        # 0x00000000025243d8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@348 (line 114)
                                                    ;   {section_word}
      0x00000000025245b6: jmp    0x0000000002524744
      0x00000000025245bb: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe1d]        # 0x00000000025243e0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@338 (line 112)
                                                    ;   {section_word}
      0x00000000025245c3: jmp    0x0000000002524744
      0x00000000025245c8: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe18]        # 0x00000000025243e8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@328 (line 110)
                                                    ;   {section_word}
      0x00000000025245d0: jmp    0x0000000002524744
      0x00000000025245d5: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe13]        # 0x00000000025243f0
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@318 (line 108)
                                                    ;   {section_word}
      0x00000000025245dd: jmp    0x0000000002524744
      0x00000000025245e2: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe0e]        # 0x00000000025243f8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@308 (line 106)
                                                    ;   {section_word}
      0x00000000025245ea: jmp    0x0000000002524744
      0x00000000025245ef: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe09]        # 0x0000000002524400
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@298 (line 104)
                                                    ;   {section_word}
      0x00000000025245f7: jmp    0x0000000002524744
      0x00000000025245fc: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe04]        # 0x0000000002524408
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@288 (line 102)
                                                    ;   {section_word}
      0x0000000002524604: jmp    0x0000000002524744
      0x0000000002524609: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffdff]        # 0x0000000002524410
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@278 (line 100)
                                                    ;   {section_word}
      0x0000000002524611: jmp    0x0000000002524744
      0x0000000002524616: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffdfa]        # 0x0000000002524418
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@268 (line 98)
                                                    ;   {section_word}
      0x000000000252461e: jmp    0x0000000002524744
      0x0000000002524623: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffd9d]        # 0x00000000025243c8
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@258 (line 96)
                                                    ;   {section_word}
      0x000000000252462b: jmp    0x0000000002524744
      0x0000000002524630: movapd xmm0,xmm1
      0x0000000002524634: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffe0c]        # 0x0000000002524448
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@242 (line 92)
                                                    ;   {section_word}
      0x000000000252463c: jmp    0x0000000002524744
      0x0000000002524641: movapd xmm0,xmm1
      0x0000000002524645: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffddb]        # 0x0000000002524428
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@236 (line 90)
                                                    ;   {section_word}
      0x000000000252464d: jmp    0x0000000002524744
      0x0000000002524652: movapd xmm0,xmm1
      0x0000000002524656: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffdd2]        # 0x0000000002524430
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@230 (line 88)
                                                    ;   {section_word}
      0x000000000252465e: jmp    0x0000000002524744
      0x0000000002524663: movapd xmm0,xmm1
      0x0000000002524667: mulsd  xmm0,QWORD PTR [rip+0xfffffffffffffdc9]        # 0x0000000002524438
                                                    ;*dmul
                                                    ; - javaapplication4.Test1::multiplyByPowerOfTen@224 (line 86)
                                                    ;   {section_word}
    
    [etc.]
    
      0x0000000002524744: add    rsp,0x10
      0x0000000002524748: pop    rbp
      0x0000000002524749: test   DWORD PTR [rip+0xfffffffffde1b8b1],eax        # 0x0000000000340000
                                                    ;   {poll_return}
      0x000000000252474f: ret    
    
    0 讨论(0)
  • 2020-11-28 01:16

    Since the question is already answered (more or less), here is some tip. Use

    private static final double[] mul={1d, 10d...};
    static double multiplyByPowerOfTen(final double d, final int exponent) {
          if (exponent<0 || exponent>=mul.length) throw new ParseException();//or just leave the IOOBE be
          return mul[exponent]*d;
    }
    

    That code uses significantly less IC (instruction cache) and will be always inlined. The array will be in L1 data cache if the code is hot. The lookup table is almost always a win. (esp. on microbenchmarks :D )

    Edit: if you wish the method to be hot-inlined, consider the non-fast paths like throw new ParseException() to be as short as minimum or move them to separate static method (hence making them short as minimum). That is throw new ParseException("Unhandled power of ten " + power, 0); is a weak idea b/c it eats a lot of the inlining budget for code that can be just interpreted - string concatenation is quite verbose in bytecode . More info and a real case w/ ArrayList

    0 讨论(0)
提交回复
热议问题