问题
I'm using Delphi XE2 Update 3. There are precision issue with even the simplest of floating-point numbers (like 3.7). Given this code (a 32-bit console app):
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses System.SysUtils;
var s: Single; d: Double; x: Extended;
begin
Write('Size of Single ----- '); Writeln(SizeOf(Single));
Write('Size of Double ----- '); Writeln(SizeOf(Double));
Write('Size of Extended --- '); Writeln(SizeOf(Extended)); Writeln;
s := 3.7; d := 3.7; x := 3.7;
Write('"s" is '); Writeln(s);
Write('"d" is '); Writeln(d);
Write('"x" is '); Writeln(x); Writeln;
Writeln('Single Comparison');
Write('"s > 3.7" is '); Writeln(s > 3.7);
Write('"s = 3.7" is '); Writeln(s = 3.7);
Write('"s < 3.7" is '); Writeln(s < 3.7); Writeln;
Writeln('Double Comparison');
Write('"d > 3.7" is '); Writeln(d > 3.7);
Write('"d = 3.7" is '); Writeln(d = 3.7);
Write('"d < 3.7" is '); Writeln(d < 3.7); Writeln;
Writeln('Extended Comparison');
Write('"x > 3.7" is '); Writeln(x > 3.7);
Write('"x = 3.7" is '); Writeln(x = 3.7);
Write('"x < 3.7" is '); Writeln(x < 3.7); Readln;
end.
I get this output:
Size of Single ----- 4
Size of Double ----- 8
Size of Extended --- 10
"s" is 3.70000004768372E+0000
"d" is 3.70000000000000E+0000
"x" is 3.70000000000000E+0000
Single Comparison
"s > 3.7" is TRUE
"s = 3.7" is FALSE
"s < 3.7" is FALSE
Double Comparison
"d > 3.7" is TRUE
"d = 3.7" is FALSE
"d < 3.7" is FALSE
Extended Comparison
"x > 3.7" is FALSE
"x = 3.7" is TRUE
"x < 3.7" is FALSE
You can see extended
is the only type that evaluates correctly. I thought precision was only an issue when using a complex floating-point number like 3.14159265358979323846
, not something as simple as 3.7
. The issue when using single
kind of makes sense. But why doesn't double
work?
回答1:
Required reading: What Every Computer Scientist Should Know About Floating-Point Arithmetic, David Goldberg.
The issue is not one of precision. Rather the issue is one of representability. First of all, let us re-cap that floating point numbers are used to represent real numbers. There are an infinite quantity of real numbers. Of course, the same can be said of integers. But the difference here is that within a particular range, there are a finite number of integers but an infinite number of real numbers. Indeed as was originally shown by Cantor, any finite interval of real numbers contains an uncountable number of real values.
So it is clear that we cannot represent all real numbers on a finite machine. So, which numbers can we represent? Well, that depends on the data type. Delphi floating point data types use binary representation. The single (32 bit) and double (64 bit) types adhere to the IEEE-754 standard. The extended (80 bit) type is an Intel specific type. In binary floating point a representable number has the form k2n where k and n are integers. Note that I am not claiming that all numbers of this form are representable. That is not possible because there are an infinite quantity of such numbers. Rather my point is that all representable numbers are of this form.
Some examples of representable binary floating point numbers include: 1, 0.5, 0.25, 0.75, 1.25, 0.125, 0.375. Your value, 3.7, is not representable as a binary floating point value.
What this means in relation to your code is that none of it is doing what you expect it to do. You are hoping to compare against the value 3.7. But instead you are comparing against the nearest exactly representably value to 3.7. As a matter of implementation detail, this nearest exactly representably value is in the context of extended precision. Which is why it appears that the version using extended does what you expect. However, do not take this to mean that your variable x
is equal to 3.7. In fact it is equal to the nearest representable extended precision value to 3.7.
Rob Kennedy's most useful website can show you the closest representable values to a specific number. In the case of 3.7 these are:
3.7 = + 3.70000 00000 00000 00004 33680 86899 42017 73602 98112 03479 76684 57031 25 3.7 = + 3.70000 00000 00000 17763 56839 40025 04646 77810 66894 53125 3.7 = + 3.70000 00476 83715 82031 25
These are presented in the order extended, double, single. In other words these are the values of your variables x
, d
and s
respectively.
If you look at these values, and compare them with the closest extended to 3.7 you will see why your program produces the output that it does. Both the single and double precision values here are greater than the extended. Which is what your program told you.
I don't want to make any blanket recommendations as to how to compare floating point values. The best way to do that always depends very critically on the specific problem. No blanket advice can be usefully given.
回答2:
Short answer: 0.7
cannot be represented exactly (binary floating point values are always fractions with denominator that is a power of 2.); the precision of the data type you're storing it in (and the one the compiler selects for the type of the constant you're comparing them to) can affect the representation of that number and have an effect on the comparison.
Moral: Never directly compare two floating point values for equality unless they're exactly the same data type and assigned the same exact value.
Obligatory link: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Another link that might be helpful is to Delphi's Math.SameValue function, that allows you to compare two floating point values for approximate equality depending on a specific allowable delta (difference).
来源:https://stackoverflow.com/questions/23667061/simple-floating-point-numbers-lose-precision