If I perform a float (single precision) operation on a Host and a Device (GPU arch sm_13) , then will the values be different ?
A good discussion of this is availble in a whitepaper from NVIDIA. Basically:
- IEEE-754 is implemented by almost everything currently;
- Even between faithful implementation of this standard, you can still see differences in results (famously, Intel's doing 80-bit internally for double precision), or high optimization settings with your compiler can change results
- Compute capability 2.0 and later NVIDIA cards support IEEE-754 in both single and double precision, with only very small caveats
- Some rounding modes aren't supported for some operations - this is only relevant if you explicitly change rounding modes in your code
- There's some subtleties involving fused multiply and adds
- CUDA also provides (slightly) lower precision but faster implementations of several operations, and of course if you use those explicitly or implicitly (with compiler options) you naturally won't get full ieee-754 results
- Compute capability 1.3 cards support ieee-754 as above in double precision but not in single precision; (single precision doesn't support denormal - eg very small - numbers, no FMAs, square root and division aren't fully accurate)
- Compute capability 1.2 cards only have single precision and those aren't full ieee-754 as above.
来源:https://stackoverflow.com/questions/10334334/ieee-754-standard-on-nvidia-gpu-sm-13