Why doesn't a compiler optimize floating-point *2 into an exponent increment?

前端 未结 9 1602
悲&欢浪女
悲&欢浪女 2021-02-06 20:45

I\'ve often noticed gcc converting multiplications into shifts in the executable. Something similar might happen when multiplying an int and a float. F

相关标签:
9条回答
  • 2021-02-06 21:26

    Common floating-point formats, particularly IEEE 754, do not store the exponent as a simple integer, and treating it as an integer will not produce correct results.

    In 32-bit float or 64-bit double, the exponent field is 8 or 11 bits, respectively. The exponent codes 1 to 254 (in float) or 1 to 2046 (in double) do act like integers: If you add one to one of these values and the result is one of these values, then the represented value doubles. However, adding one fails in these situations:

    • The initial value is 0 or subnormal. In this case, the exponent field starts at zero, and adding one to it adds 2-126 (in float) or 2-1022 (in double) to the number; it does not double the number.
    • The initial value exceeds 2127 (in float) or 21023 (in double). In this case, the exponent field starts at 254 or 2046, and adding one to it changes the number to a NaN; it does not double the number.
    • The initial value is infinity or a NaN. In this case, the exponent field starts at 255 or 2047, and adding one to it changes it to zero (and is likely to overflow into the sign bit). The result is zero or a subnormal but should be infinity or a NaN, respectively.

    (The above is for positive signs. The situation is symmetric with negative signs.)

    As others have noted, some processors do not have facilities for manipulating the bits of floating-point values quickly. Even on those that do, the exponent field is not isolated from the other bits, so you typically cannot add one to it without overflowing into the sign bit in the last case above.

    Although some applications can tolerate shortcuts such as neglecting subnormals or NaNs or even infinities, it is rare that applications can ignore zero. Since adding one to the exponent fails to handle zero properly, it is not usable.

    0 讨论(0)
  • 2021-02-06 21:27

    On modern CPUs, multiplication typically has one-per-cycle throughput and low latency. If the value is already in a floating point register, there's no way you'll beat that by juggling it around to do integer arithmetic on the representation. If it's in memory to begin with, and if you're assuming neither the current value nor the correct result would be zero, denormal, nan, or infinity, then it might be faster to perform something like

    addl $0x100000, 4(%eax)   # x86 asm example
    

    to multiply by two; the only time I could see this being beneficial is if you're operating on a whole array of floating-point data that's bounded away from zero and infinity, and scaling by a power of two is the only operation you'll be performing (so you don't have any existing reason to be loading the data into floating point registers).

    0 讨论(0)
  • 2021-02-06 21:29

    It may be useful for embedded systems compilers to have special scale-by-power-of-two pseudo-op which could be translated by the code generator in whatever fashion was optimal for the machine in question, since on some embedded processors focusing on the exponent may be an order of magnitude faster than doing a full power-of-two multiplication, but on the embedded micros where multiplication is slowest, a compiler could probably achieve a bigger performance boost by having the floating-point-multiply routine check its arguments at run-time so as to skip over parts of the mantissa that are zero.

    0 讨论(0)
  • 2021-02-06 21:30

    Here's an actual compiler optimization I'm seeing with GCC 10:

    x = 2.0 * hi * lo;
    

    Generates this code:

    mulsd   %xmm1, %xmm0      # x = hi * lo;
    addsd   %xmm0, %xmm0      # x += x;
    
    0 讨论(0)
  • 2021-02-06 21:32

    It's not about compilers or compiler writers not being smart. It's more like obeying standards and producing all the necessary "side effects" such as Infs, Nans, and denormals.

    Also it can be about not producing other side effects that are not called for, such as reading memory. But I do recognize that it can be faster in some circumstances.

    0 讨论(0)
  • 2021-02-06 21:35

    Actually, this is what happens in the hardware.

    The 2 is also passed into the FPU as a floating point number, with a mantissa of 1.0 and an exponent of 2^1. For the multiplication, the exponents are added, and the mantissas multiplied.

    Given that there is dedicated hardware to handle the complex case (multiplying with values that are not powers of two), and the special case is not handled any worse than it would be using dedicated hardware, there is no point in having additional circuitry and instructions.

    0 讨论(0)
提交回复
热议问题