How do Java runtimes targeting pre-SSE2 processors implement floating-point basic operations?

后端 未结 1 1307
滥情空心
滥情空心 2020-11-29 11:15

How does(did) a Java runtime targeting an Intel processor without SSE2 deal with floating-point denormals, when strictfp is set?

Even when the 387 FPU i

相关标签:
1条回答
  • 2020-11-29 11:57

    It looks to me, from a very trivial test case, like the JVM round-trips every double computation through memory to get the rounding it wants. It also seems to do something weird with a couple of magic constants. Here's what it did for me for a simple "compute 2^n naively" program:

    0xb1e444b0: fld1
    0xb1e444b2: jmp    0xb1e444dd         ;*iload
                                          ; - fptest::calc@9 (line 6)
    0xb1e444b7: nop
    0xb1e444b8: fldt   0xb523a2c8         ;   {external_word}
    0xb1e444be: fmulp  %st,%st(1)
    0xb1e444c0: fmull  0xb1e44490         ;   {section_word}
    0xb1e444c6: fldt   0xb523a2bc         ;   {external_word}
    0xb1e444cc: fmulp  %st,%st(1)
    0xb1e444ce: fstpl  0x10(%esp)
    0xb1e444d2: inc    %esi               ; OopMap{off=51}
                                          ;*goto
                                          ; - fptest::calc@22 (line 6)
    0xb1e444d3: test   %eax,0xb3f8d100    ;   {poll}
    0xb1e444d9: fldl   0x10(%esp)         ;*goto
                                          ; - fptest::calc@22 (line 6)
    0xb1e444dd: cmp    %ecx,%esi
    0xb1e444df: jl     0xb1e444b8         ;*if_icmpge
                                          ; - fptest::calc@12 (line 6)
    

    I believe 0xb523a2c8 and 0xb523a2bc are _fpu_subnormal_bias1 and _fpu_subnormal_bias2 from the hotspot source code. _fpu_subnormal_bias1 looks to be 0x03ff8000000000000000 and _fpu_subnormal_bias2 looks to be 0x7bff8000000000000000. _fpu_subnormal_bias1 has the effect of scaling the smallest normal double to the smallest normal long double; if the FPU rounds to 53 bits, the "right thing" will happen.

    I'd speculate that the seemingly-pointless test instruction is there so that the thread can be interrupted by marking that page unreadable in the event that a GC is necessary.

    Here's the Java code:

    import java.io.*;
    public strictfp class fptest {
     public static double calc(int k) {
      double a = 2.0;
      double b = 1.0;
      for (int i = 0; i < k; i++) {
       b *= a;
      }
      return b;
     }
     public static double intest() {
      double d = 0;
      for (int i = 0; i < 4100; i++) d += calc(i);
      return d;
     }
     public static void main(String[] args) throws Exception {
      for (int i = 0; i < 100; i++)
       System.out.println(intest());
     }
    }
    

    Digging further, the code for these operations is in plain sight in the OpenJDK code in hotspot/src/cpu/x86/vm/x86_63.ad. Relevant snippets:

    instruct strictfp_mulD_reg(regDPR1 dst, regnotDPR1 src) %{
      predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
    ->method()->is_strict() );
      match(Set dst (MulD dst src));
      ins_cost(1);   // Select this instruction for all strict FP double multiplies
    
      format %{ "FLD    StubRoutines::_fpu_subnormal_bias1\n\t"
                "DMULp  $dst,ST\n\t"
                "FLD    $src\n\t"
                "DMULp  $dst,ST\n\t"
                "FLD    StubRoutines::_fpu_subnormal_bias2\n\t"
                "DMULp  $dst,ST\n\t" %}
      opcode(0xDE, 0x1); /* DE C8+i or DE /1*/
      ins_encode( strictfp_bias1(dst),
                  Push_Reg_D(src),
                  OpcP, RegOpc(dst),
                  strictfp_bias2(dst) );
      ins_pipe( fpu_reg_reg );
    %}
    
    instruct strictfp_divD_reg(regDPR1 dst, regnotDPR1 src) %{
      predicate (UseSSE<=1);
      match(Set dst (DivD dst src));
      predicate( UseSSE<=1 && Compile::current()->has_method() && Compile::current()
    ->method()->is_strict() );
      ins_cost(01);
    
      format %{ "FLD    StubRoutines::_fpu_subnormal_bias1\n\t"
                "DMULp  $dst,ST\n\t"
                "FLD    $src\n\t"
                "FDIVp  $dst,ST\n\t"
                "FLD    StubRoutines::_fpu_subnormal_bias2\n\t"
                "DMULp  $dst,ST\n\t" %}
      opcode(0xDE, 0x7); /* DE F8+i or DE /7*/
      ins_encode( strictfp_bias1(dst),
                  Push_Reg_D(src),
                  OpcP, RegOpc(dst),
                  strictfp_bias2(dst) );
      ins_pipe( fpu_reg_reg );
    %}
    

    I see nothing for addition and subtraction, but I'd bet they just do an add/subtract with the FPU in 53-bit mode and then round-trip the result through memory. I'm a little curious whether there's a tricky overflow case that they get wrong, but I'm not curious enough to find out.

    0 讨论(0)
提交回复
热议问题