问题
Please see the following code in C#.
float a = 10.0f;
float b = 0.1f;
float c = a / b;
int indirect = (int)(c);
// Value of indirect is 100 always
int direct = (int)(a / b);
// Value of direct is 99 in 32 bit process (?)
// Value of direct is 100 in 64 bit process
Why do we get 99 in 32-bit processes?
I am using VS2013.
回答1:
When you operate directly, it's permittable for operations to be performed at a higher precision, and for that higher precision to be continued for multiple operations.
From section 4.1.6 of the C# 5 specification:
Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects. However, in expressions of the form x * y / z, where the multiplication produces a result that is outside the double range, but the subsequent division brings the temporary result back into the double range, the fact that the expression is evaluated in a higher range format may cause a finite result to be produced instead of an infinity.
I'd expect that in some optimization scenarios, it would even be possible for the answer to be "wrong" with the extra local variable, if the JIT decides that it never really needs the value as a float
. (I've seen cases where just adding logging changes the behaviour here...)
In this case, I believe that the division is effectively being performed using 64-bit arithmetic and then cast from double
straight to int
rather than going via float
first.
Here's some code to demonstrate that, using a DoubleConverter
class which allows you to find the exact decimal representation of a floating binary point number:
using System;
class Test
{
static void Main()
{
float a = 10f;
float b = 0.1f;
float c = a / b;
double d = (double) a / (double) b;
float e = (float) d;
Console.WriteLine(DoubleConverter.ToExactString(c));
Console.WriteLine(DoubleConverter.ToExactString(d));
Console.WriteLine(DoubleConverter.ToExactString(e));
Console.WriteLine((int) c);
Console.WriteLine((int) d);
Console.WriteLine((int) e);
}
}
Output:
100
99.999998509883909036943805404007434844970703125
100
100
99
100
Note that the operation may not just be performed in 64-bits - it may be performed at even higher precision, e.g. 80 bits.
This is just one of the joys of floating binary point arithmetic - and an example of why you need to be very careful about what you're doing.
Note that 0.1f is exactly 0.100000001490116119384765625 - so more than 0.1. Given that it's more than 0.1, I would expect 10/b
to be a little less than 100 - if that "little less" is representable, then truncating the result is going to naturally lead to 99.
来源:https://stackoverflow.com/questions/24304011/wrong-value-after-type-casting-in-32-bit-process