Denormalized floating point in Objective-C?

后端 未结 1 955
不思量自难忘°
不思量自难忘° 2020-12-31 20:10

What is the relevance of Stack Overflow question/answer Why does changing 0.1f to 0 slow down performance by 10x? for Objective-C? If there is any relevance, h

相关标签:
1条回答
  • 2020-12-31 20:42

    As I said in response to your comment there:

    it is more of a CPU than a language issue, so it probably has relevance for Objective-C on x86. (iPhone's ARMv7 doesn't seem to support denormalized floats, at least with the default runtime/build settings)

    Update

    I just tested. On Mac OS X on x86 the slowdown is observed, on iOS on ARMv7 it is not (default build settings).

    And as to be expected, running on iOS simulator (on x86) denormalized floats appear again.

    Interestingly, FLT_MIN and DBL_MIN respectively are defined to the smallest non-denormalized number (on iOS, Mac OS X, and Linux). Strange things happen using

    DBL_MIN/2.0
    

    in your code; the compiler happily sets a denormalized constant, but as soon as the (arm) CPU touches it, it is set to zero:

    double test = DBL_MIN/2.0;
    printf("test      == 0.0 %d\n",test==0.0);
    printf("DBL_MIN/2 == 0.0 %d\n",DBL_MIN/2.0==0.0);
    

    Outputs:

    test      == 0.0 1  // computer says YES
    DBL_MIN/2 == 0.0 0  // compiler says NO
    

    So a quick runtime check if denormalization is supported can be:

    #define SUPPORT_DENORMALIZATION ({volatile double t=DBL_MIN/2.0;t!=0.0;})
    

    ("given without even the implied warranty of fitness for any purpose")

    This is what ARM has to say on flush to zero mode: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204h/Bcfheche.html

    Update<<1

    This is how you disable flush to zero mode on ARMv7:

    int x;
    asm(
        "vmrs %[result],FPSCR \r\n"
        "bic %[result],%[result],#16777216 \r\n"
        "vmsr FPSCR,%[result]"
        :[result] "=r" (x) : :
    );
    printf("ARM FPSCR: %08x\n",x);
    

    with the following surprising result.

    • Column 1: a float, divided by 2 for every iteration
    • Column 2: the binary representation of this float
    • Column 3: the time taken to sum this float 1e7 times

    You can clearly see that the denormalization comes at zero cost. (For an iPad 2. On iPhone 4, it comes at a small cost of a 10% slowdown.)

    0.000000000000000000000000000000000100000004670110: 10111100001101110010000011100000 110 ms
    0.000000000000000000000000000000000050000002335055: 10111100001101110010000101100000 110 ms
    0.000000000000000000000000000000000025000001167528: 10111100001101110010000001100000 110 ms
    0.000000000000000000000000000000000012500000583764: 10111100001101110010000110100000 110 ms
    0.000000000000000000000000000000000006250000291882: 10111100001101110010000010100000 111 ms
    0.000000000000000000000000000000000003125000145941: 10111100001101110010000100100000 110 ms
    0.000000000000000000000000000000000001562500072970: 10111100001101110010000000100000 110 ms
    0.000000000000000000000000000000000000781250036485: 10111100001101110010000111000000 110 ms
    0.000000000000000000000000000000000000390625018243: 10111100001101110010000011000000 110 ms
    0.000000000000000000000000000000000000195312509121: 10111100001101110010000101000000 110 ms
    0.000000000000000000000000000000000000097656254561: 10111100001101110010000001000000 110 ms
    0.000000000000000000000000000000000000048828127280: 10111100001101110010000110000000 110 ms
    0.000000000000000000000000000000000000024414063640: 10111100001101110010000010000000 110 ms
    0.000000000000000000000000000000000000012207031820: 10111100001101110010000100000000 111 ms
    0.000000000000000000000000000000000000006103515209: 01111000011011100100001000000000 110 ms
    0.000000000000000000000000000000000000003051757605: 11110000110111001000010000000000 110 ms
    0.000000000000000000000000000000000000001525879503: 00010001101110010000100000000000 110 ms
    0.000000000000000000000000000000000000000762939751: 00100011011100100001000000000000 110 ms
    0.000000000000000000000000000000000000000381469876: 01000110111001000010000000000000 112 ms
    0.000000000000000000000000000000000000000190734938: 10001101110010000100000000000000 110 ms
    0.000000000000000000000000000000000000000095366768: 00011011100100001000000000000000 110 ms
    0.000000000000000000000000000000000000000047683384: 00110111001000010000000000000000 110 ms
    0.000000000000000000000000000000000000000023841692: 01101110010000100000000000000000 111 ms
    0.000000000000000000000000000000000000000011920846: 11011100100001000000000000000000 110 ms
    0.000000000000000000000000000000000000000005961124: 01111001000010000000000000000000 110 ms
    0.000000000000000000000000000000000000000002980562: 11110010000100000000000000000000 110 ms
    0.000000000000000000000000000000000000000001490982: 00010100001000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000745491: 00101000010000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000372745: 01010000100000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000186373: 10100001000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000092486: 01000010000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000046243: 10000100000000000000000000000000 111 ms
    0.000000000000000000000000000000000000000000022421: 00001000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000011210: 00010000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000005605: 00100000000000000000000000000000 111 ms
    0.000000000000000000000000000000000000000000002803: 01000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000001401: 10000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    
    0 讨论(0)
提交回复
热议问题