Why denormalized floats are so much slower than other floats, from hardware architecture viewpoint?

前端 未结 2 1054
北海茫月
北海茫月 2021-01-01 19:45

Denormals are known to underperform severely, 100x or so, compared to normals. This frequently causes unexpected software problems.

I\'m curious, from CPU Architect

相关标签:
2条回答
  • 2021-01-01 20:22

    On most x86 systems, the cause of slowness is that denormal values trigger an FP_ASSIST which is very costly as it switches to a micro-code flow (very much like a fault).

    see for example - https://software.intel.com/en-us/forums/intel-performance-bottleneck-analyzer/topic/487262

    The reason why this is the case, is probably that the architects decided to optimize the HW for normal values by speculating that each value is normalized (which would be more common), and did not want to risk the performance of the frequent use case for the sake of rare corner cases. This speculation is usually true, so you only pay the penalty when you're wrong. These trade-offs are very common in CPU design since any investment in one case usually adds an overhead on the entire system.

    In this case, if you were to design a system that tries to optimize all type of irregular FP values, you would have to either add HW to detect and record the state of each value after each operation (which would be multiplied by the number of physical FP registers, execution units, RS entries and so on - totaling in a significant number of transistors and wires. Alternatively, you would have to add some mechanism to check the value on read, which would slow you down when reading any FP value (even on the normal ones).

    Furthermore, based on the type, you would need to perform some correction or not - on x86 this is the purpose of the assist code, but if you did not make a speculation, you would have to perform this flow conditionally on each value, which would already add a large chunk of that overhead on the common path.

    0 讨论(0)
  • 2021-01-01 20:29

    Denormals are not handled by the FPU (H/W) in many architectures - so that leaves the implementation to s/w

    There's a good basic intro here https://en.wikipedia.org/wiki/Denormal_number

    Under Performance issues -

    0 讨论(0)
提交回复
热议问题