I\'ve often noticed gcc converting multiplications into shifts in the executable. Something similar might happen when multiplying an int
and a float
. F
For example, 2 * f, might simply increment the exponent of f by 1, saving some cycles.
This simply isn't true.
First you have too many corner cases such as zero, infinity, Nan, and denormals. Then you have the performance issue.
The misunderstanding is that incrementing the exponent is not faster than doing a multiplication.
If you look at the hardware instructions, there is no direct way to increment the exponent. So what you need to do instead is:
There is generally a medium to large latency for moving data between the integer and floating-point execution units. So in the end, this "optimization" becomes much worse than a simple floating-point multiply.
So the reason why the compiler doesn't do this "optimization" is because it isn't any faster.
A previous Stackoverflow question about multiplication by powers of 2. The consensus, and the actual implementations, proved that unfortunately, there is no current way to be more efficient than standard multiplication.
If you think that multiplying by two means increasing the exponent by 1, think again. Here are the possible cases for IEEE 754 floating-point arithmetic:
Case 1: Infinity and NaN stay unchanged.
Case 2: Floating-point numbers with the largest possible exponent are changed to Infinity by increasing the exponent and setting the mantissa except for the sign bit to zero.
Case 3: Normalised floating-point numbers with exponent less than the maximum possible exponent have their exponent increased by one. Yippee!!!
Case 4: Denormalised floating-point numbers with the highest mantissa bit set have their exponent increased by one, turning them into normalised numbers.
Case 5: Denormalised floating-point numbers with the highest mantissa bit cleared, including +0 and -0, have their mantissa shifted to the left by one bit position, leaving the exponent unchanged.
I very much doubt that a compiler producing integer code handling all these cases correctly will be anywhere as fast as the floating-point built into the processor. And it's only suitable for multiplication by 2.0. For multiplication by 4.0 or 0.5, a whole new set of rules applies. And for the case of multiplication by 2.0, you might try to replace x * 2.0 with x + x, and many compilers do this. That is they do it, because a processor might be able for example to do one addition and one multiplication at the same time, but not one of each kind. So sometimes you would prefer x * 2.0, and sometimes x + x, depending on what other operations need doing at the same time.