问题
I am using the Newton-Raphson Algorithm to divide IEEE-754 single-precision floating point values using single precision hardware.
I am using the method described at these two links:
- Wikipedia Newton-Raphson Division
- Newton-Raphson Method I'm Using
However, despite computing Xi to X_3 (i.e. using 3 iterations), my answers are still off a bit. I'm wondering why this is so? I'm comparing my results using MATLAB.
Here is the output showing an example of an incorrect result
===== DIVISION RESULT =====
Dividend: 94C5D0C9
Divisor: D3ACDF9A
My answer: 009277C0
My answer: +0.00000
Correct answer: 009277C2
Correct answer: +0.00000
error =
2.384186e-007
Error: 34800000
k =
212823
I have attached my MATLAB code here:
% Testing Newton-Raphson Division Algorithm
% Clear window and variables
clc;
clear;
% Open file for reading
fid = fopen('generatedfloats_for_fpdiv.txt','r');
% Put all data into an array
data = fscanf(fid,'%x');
% Loop through data array 2 values at a time
for k=1:2:length(data)
% Formatting floating point numbers into four parts
% Split upper and lower 16 bits of each number into a separate variable
dividend = typecast(uint32(data(k)),'single');
divisor = typecast(uint32(data(k+1)),'single');
% Get upper 16 bits of divided and divisor
a_inword1 = (bitshift(data(k),-16));
b_inword1 = (bitshift(data(k+1),-16));
% Get exponents
a_exponent = bitshift(a_inword1,-7); % Shift exponent right
a_exponent = bitand(a_exponent,hex2dec('00ff')); % Remove sign bit from exponent
b_exponent = bitshift(b_inword1,-7); % Shift exponent right
b_exponent = bitand(b_exponent,hex2dec('00ff')); % Remove sign bit from exponent
% Create Scaled Divisor and Dividend
divisor_scaled = abs(divisor) * 2^((-1) - (b_exponent-127));
dividend_scaled = abs(dividend) * 2^((a_exponent-b_exponent -1) - (a_exponent-127));
% Solve for X0 = 48/17-32/17*D'
% Perform multiplication 32/17*D'
mult1 = single(single(32/17) * single(divisor_scaled));
% Perform subtraction 48/17 - (32/17*D')
x0 = single(single(48/17) - single(mult1));
xi = x0;
% Solve for Xi + Xi(1-D'Xi)
for i=1:3
% Multiply D'Xi
mult2 = single(single(divisor_scaled)*single(xi));
% Subtract 1 - (D'Xi)
sub2 = single(single(1) - single(mult2));
% Multiply Xi * (1-D'Xi)
mult3 = single(single(xi) * single(sub2));
% Add Xi + (Xi(1-D'Xi))
xi = single(single(xi) + single(mult3));
end
% Perform final multiplication N'X3
myresult = single(single(dividend_scaled) * single(xi));
% Invert sign if result is supposed to be negative
if (divisor * dividend < 0)
myresult = -myresult;
end
fprintf('===== DIVISION RESULT =====\n');
fprintf('Dividend: %08X\n', data(k));
fprintf('Divisor: %08X\n', data(k+1));
fprintf('My answer: %08X\n',hex2dec(num2hex(single(myresult))));
fprintf('My answer: %+1.5f\n',single(myresult));
fprintf('Correct answer: %08X\n',hex2dec(num2hex(single(single(dividend)/single(divisor)))))
fprintf('Correct answer: %+1.5f\n\n',single(single(dividend)/single(divisor)));
% Calculate Error
if(dividend == 0)
error = abs(single(0-myresult));
else
error = abs(single(single(single(dividend)/single(divisor))/single(myresult) - 1));
end
% Pause if there is an error greater than 1 ulp
if (error > 2^-23)
error
fprintf('Error: %08X\n', hex2dec(num2hex(single(error))));
k
pause
end
% Close Input File
fclose(fid);
Here is the file containing the testing inputs (this should be stored in a file called generatedfloats_for_fpdiv.txt):
7A0C006C
FAA1654F
FF7FFFFF
FF7FFFFF
00000000
FF7FFFFF
7F7FFFFF
FF7FFFFF
00000000
D9E6C924
80800000
80800000
BF800000
80800000
00000000
80800000
3F800000
80800000
00800000
80800000
FF7FFFFF
BF800000
80F329D0
BF800000
80800000
BF800000
BF800000
BF800000
00000000
BF800000
3F800000
BF800000
00800000
BF800000
4D2798AD
BF800000
7F7FFFFF
BF800000
FF7FFFFF
3F800000
C5F4B01A
3F800000
80800000
3F800000
BF800000
3F800000
00000000
3F800000
3F800000
3F800000
00800000
3F800000
02A8DEF8
3F800000
7F7FFFFF
3F800000
A2C4DB20
00800000
80800000
00800000
BF800000
00800000
00000000
00800000
3F800000
00800000
00800000
00800000
00000000
6F301E6E
FF7FFFFF
7F7FFFFF
00000000
7F7FFFFF
7F7FFFFF
7F7FFFFF
CFFFFFFF
10000000
E7DCD51B
3F800000
3DD8AD6C
FCD8AD6C
0CB82AD0
8CB82AD0
8A856DBC
8A856DBC
ADE5A620
ECE5A620
4F7FFFFF
0F800000
C1023D04
C1023CF4
8F6A27BC
8F6A2782
76E0BCED
76D9ED81
B59F5AE2
F49F5A43
12AC77FD
51AC6271
3520744A
740EA042
5D594D83
1D5B038B
AC104C35
2C104C11
9CB2D620
5BB2D61D
AB1AC0B0
6A1AC0AB
1F7C4A96
DE7C4A77
95AD8847
54AD5CF0
1C22AD4A
DB225C1C
7BED0479
BBED047B
C094DEEC
0094DEF6
4DC9E15F
8DC9E1C5
FBC3003F
3BC30103
FEC93614
3EC9686F
436A5FC8
836B4B14
D52DD147
153967C4
A6666AC4
29800000
F98E9ABA
FB000000
36FBF9DF
EF000000
402409D5
7A800000
C81946D2
7C915A00
EC1D18DD
B2E597A5
709EA505
321F01DC
FE51BE2A
4DB9A1BC
6F69A411
60F5814E
08CCC56E
B0A84889
643A641A
6EAE2570
AC0C952C
57000000
D534A524
FE800000
B05CD51B
58000000
8F4D8AD7
95800000
FD2ECBC8
51B61A06
FCCB65E4
CFA4FECB
6EAE153B
BFE119AC
C1A97E44
913647B0
1F730C8E
8EAA321D
C3BC9DEC
A51D0BC8
30A4350D
9058EA7E
9CE2C0D1
9AF8176B
CA3FFFFD
8A400000
F008F561
B008F563
F0754048
B075404B
7D7FFFFD
3D800000
52B209A5
12B209A7
7EFE1868
3EFE186A
C5FFFFFE
86000000
E39A4BC8
A39A4BC9
455EDB0A
055EDB0B
677FFFFF
27800000
C44D976B
044D976C
ED126E29
2D126E2A
DFFFFFFE
20000000
4ED6F4EB
8ED6F4ED
58397C68
98397C6A
C2FFFFFD
03000000
EDED365A
2DED365D
6D3FFFFD
AD400000
E460628E
26000000
E821B21E
29978BE6
5749CDF0
97D2299C
55BF2E4A
95E1CA69
3F50B86B
00D65D71
61DC531D
23000000
FB52DDC0
BC4BE0E8
F6744295
B6B52B80
BDAD17D0
FCAD17D0
03C572DD
42C572DC
90800001
CF800000
8AE965BE
C9E965BC
846E5BBB
C36E5BB8
30800002
6F800000
BC512F61
FB512F5D
BA15F623
F915F620
30800003
6F800000
91800003
50800000
27F87CF2
E6F87CED
1CDDA7B7
DBDDA7B3
88800002
47800000
1AA27658
D9A27656
A1C7E937
60C7E935
16800001
D5800000
35F35105
F4F35105
1F45A1EB
5DE7B505
0E20D62E
CC4E27A5
ADAA8A46
6B800000
B9D2BA3D
F7000000
B87ACA34
F5000000
087E87BE
C4C7E789
FF0C0958
5F1010C0
CF808800
AA2060A1
AAAAAAAA
BFDAFA00
F1199999
7DC06112
5DE66666
EB9295A3
56FFFDB9
AD400241
F0FEEFFF
421859B2
7D800289
F3A82C04
0A11D000
93041140
03800000
BF844014
88100400
AE100000
412AAAAA
7F000010
48199999
63820000
41CE400D
40002000
5AE13FFF
22040000
15000744
1B000002
2E7FFFFF
1FC00000
BA800000
8C080000
DDFD5AB9
AC200080
D8AAAAAA
5B820002
45555555
F9809000
ABE66666
00A00100
857FFF30
10040004
D2302FFF
B5800003
AC0003FF
4A004001
12000000
88000280
702AAAAA
3997DBFD
A9145FFF
1EF56DE9
038000BB
08EE3FFF
B0687000
A25FD21D
43C20004
E5FFFFDF
11810040
447FDFFF
4DFB7FFF
A1FFFFFB
C72AAAAA
8AFFBFFF
54555555
77FFFDFF
274F8658
48FFBFFF
B47FF8A6
6BFFEFFF
474C2000
117FFFFB
45800200
DCDF7FFF
5D808001
6CF5FFFF
40FD5AF7
A1FF3FFF
FC7FEFFF
C27DF7FF
BEAAAAAA
B7FF3FFF
EB555555
59FBEFFF
58199999
D7EFFFFB
31E66666
CFEFFFFE
BE9EAB13
017EFFEF
2B7FFD61
80BFFFFD
6B7FFFFF
5F7DFFFB
D7800000
F7FFBFBF
71832006
72AAAAAA
CA800001
A7AAAAAA
3F904000
AC2AAAAA
84D7DF6F
C1AAAAAA
827FFFFB
36AAAAAA
15AAAAAA
9A2AAAAA
2F199999
63AAAAAA
BBE66666
1E2AAAAA
15ED4FCE
8EAAAAAA
82FFFA1D
B02AAAAA
6F004FFF
4AAAAAAA
487DB000
CD2AAAAA
C2FFFFFF
1DAAAAAA
4E800000
D8AAAAAA
F1900000
B7D55555
C9080200
9CD55555
A86FF7CF
D1D55555
BBFFDFDF
8C555555
ACAAAAAA
3A555555
86199999
BCD55555
52E66666
77555555
19FFFED8
0FD55555
A6B99FFF
50D55555
5300038D
2BD55555
59FFFFFF
BDD55555
FAE0C1AD
E2199999
8F800400
87999999
AE024000
6C999999
A1DF7DFF
A1999999
CFDFFFFF
36999999
736EFFFF
6C199999
442AAAAA
99999999
D0D55555
DA999999
1D999999
1C199999
E9E66666
D6999999
641FF15D
27199999
F0FFFBAC
68999999
7D9DDFFF
69999999
29800061
44199999
571B3000
EC199999
BD800000
D7999999
D3080080
7CE66666
5F7FFEFF
C7666666
3ADFFF7F
3C666666
DB2AAAAA
39666666
68555555
7A666666
D4999999
9BE66666
41E66666
BCE66666
CE02CDE5
77666666
487FF867
C8666666
D4EC7FFF
C9666666
8100051C
A3E66666
BAFFFFFF
06666666
B3800024
0CBBF6B5
91E7F3FF
318D3102
416FFFFF
66B808BC
207EFF7F
DBE3E077
52AAAAAA
B89D573A
FF555555
D9DE7340
6C199999
3B1F0F46
D8E66666
1C60AB4C
655172A1
D6F63010
3F7FFD22
47A7E55D
6C3A9FFF
68E90163
388001D7
237E0628
A54DD000
44BFA22E
127FFFFF
A600BE34
AC000000
D6B2F380
441C889C
627FFA61
65800040
B77FFA1C
35840100
1CFFF82A
15F779FF
01FFFA77
6FFEFFFF
56FFFA32
0377FEFF
ABFFF9EC
03199999
13FFFAC9
50666666
9BFFFE10
FCA0175C
767FFAD5
9000067A
09FFF82D
3C9C8000
E47FFCF2
A97FFFFF
05FFF8F8
23800000
567FFE45
12464A93
32494FFF
71880000
47746FFF
2CFFBFEB
D1AB8FFF
7DFFFEFF
66D69FFF
08FFF6FF
BC023FFF
53AAAAAA
A83B5FFF
9A199999
B3824FFF
93845430
8F596FFF
27800336
A938FFFF
B3EB6000
63CE7FFF
00FFFFFF
A50F5FFF
8D000000
A650AFFF
303024C9
B18007DF
BB810000
57800507
EFEFF7FF
6C0004D8
EB2AAAAA
C7800641
B7D55555
C2000305
31999999
13000052
BE666666
54000458
B7FFFB39
30000522
11BC6FFF
A080026F
BE000009
82000675
4ACF2000
BC800339
97FFFFFF
04800681
24800000
05800287
DF872000
F267D000
C0300000
2C643000
FEEFFFF7
7C20C000
42AAAAAA
6749E000
CF555555
A88B4000
A9199999
9290A000
0EFFFFF5
8FA8D000
BB9F4FFF
F0EA7000
558004C5
219BB000
E21DC000
DC313000
4EFFFFFF
3D725000
FB800000
5EB3E000
EFD80480
C27FFFFF
82080008
3C7FFFFF
AD3FFDDB
51FFFFFF
497FFFFE
A6FFFFFF
A16DFFFF
DBFFFFFF
392AAAAA
A07FFFFF
AD666666
537FFFFF
39DAEA49
54FFFFFF
E67FFCB0
2F7FFFFF
12EDEFFF
507FFFFF
CD000180
A17FFFFF
79D75000
C27FFFFF
52800000
FE7FFFFF
EE800080
B7000000
90100200
CC800000
BE7DFBFD
61800000
DD7FDFFF
C7000000
90AAAAAA
3F800000
3DD55555
60800000
C9999999
1B800000
64666666
4C000000
1DFFF953
35000000
EA3C0FFF
EF800000
D1267000
61800000
CA000000
7D800000
FF0C0958
5F1010C0
CF808800
AA2060A1
AAAAAAAA
BFDAFA00
F1199999
7DC06112
5DE66666
EB9295A3
56FFFDB9
AD400241
F0FEEFFF
421859B2
7D800289
F3A82C04
0A11D000
93041140
03800000
BF844014
88100400
AE100000
412AAAAA
7F000010
48199999
63820000
41CE400D
40002000
5AE13FFF
22040000
15000744
1B000002
2E7FFFFF
1FC00000
BA800000
8C080000
DDFD5AB9
AC200080
D8AAAAAA
5B820002
45555555
F9809000
ABE66666
00A00100
857FFF30
10040004
D2302FFF
B5800003
AC0003FF
4A004001
12000000
88000280
702AAAAA
3997DBFD
A9145FFF
1EF56DE9
038000BB
08EE3FFF
B0687000
A25FD21D
43C20004
E5FFFFDF
11810040
447FDFFF
4DFB7FFF
A1FFFFFB
C72AAAAA
8AFFBFFF
54555555
77FFFDFF
274F8658
48FFBFFF
B47FF8A6
6BFFEFFF
474C2000
117FFFFB
45800200
DCDF7FFF
5D808001
6CF5FFFF
40FD5AF7
A1FF3FFF
FC7FEFFF
C27DF7FF
BEAAAAAA
B7FF3FFF
EB555555
59FBEFFF
58199999
D7EFFFFB
31E66666
CFEFFFFE
BE9EAB13
017EFFEF
2B7FFD61
80BFFFFD
6B7FFFFF
5F7DFFFB
D7800000
F7FFBFBF
71832006
72AAAAAA
CA800001
A7AAAAAA
3F904000
AC2AAAAA
84D7DF6F
C1AAAAAA
827FFFFB
36AAAAAA
15AAAAAA
9A2AAAAA
2F199999
63AAAAAA
BBE66666
1E2AAAAA
15ED4FCE
8EAAAAAA
82FFFA1D
B02AAAAA
6F004FFF
4AAAAAAA
487DB000
CD2AAAAA
C2FFFFFF
1DAAAAAA
4E800000
D8AAAAAA
F1900000
B7D55555
C9080200
9CD55555
A86FF7CF
D1D55555
BBFFDFDF
8C555555
ACAAAAAA
3A555555
86199999
BCD55555
52E66666
77555555
19FFFED8
0FD55555
A6B99FFF
50D55555
5300038D
2BD55555
59FFFFFF
BDD55555
FAE0C1AD
E2199999
8F800400
87999999
AE024000
6C999999
A1DF7DFF
A1999999
CFDFFFFF
36999999
736EFFFF
6C199999
442AAAAA
99999999
D0D55555
DA999999
1D999999
1C199999
E9E66666
D6999999
641FF15D
27199999
F0FFFBAC
68999999
7D9DDFFF
69999999
29800061
44199999
571B3000
EC199999
BD800000
D7999999
D3080080
7CE66666
5F7FFEFF
C7666666
3ADFFF7F
3C666666
DB2AAAAA
39666666
68555555
7A666666
D4999999
9BE66666
41E66666
BCE66666
CE02CDE5
77666666
487FF867
C8666666
D4EC7FFF
C9666666
8100051C
A3E66666
BAFFFFFF
06666666
B3800024
0CBBF6B5
91E7F3FF
318D3102
416FFFFF
66B808BC
207EFF7F
DBE3E077
52AAAAAA
B89D573A
FF555555
D9DE7340
6C199999
3B1F0F46
D8E66666
1C60AB4C
655172A1
D6F63010
3F7FFD22
47A7E55D
6C3A9FFF
68E90163
388001D7
237E0628
A54DD000
44BFA22E
127FFFFF
A600BE34
AC000000
D6B2F380
441C889C
627FFA61
65800040
B77FFA1C
35840100
1CFFF82A
15F779FF
01FFFA77
6FFEFFFF
56FFFA32
0377FEFF
ABFFF9EC
03199999
13FFFAC9
50666666
9BFFFE10
FCA0175C
767FFAD5
9000067A
09FFF82D
3C9C8000
E47FFCF2
A97FFFFF
05FFF8F8
23800000
567FFE45
12464A93
32494FFF
71880000
47746FFF
2CFFBFEB
D1AB8FFF
7DFFFEFF
66D69FFF
08FFF6FF
BC023FFF
53AAAAAA
A83B5FFF
9A199999
B3824FFF
93845430
8F596FFF
27800336
A938FFFF
B3EB6000
63CE7FFF
00FFFFFF
A50F5FFF
8D000000
A650AFFF
303024C9
B18007DF
BB810000
57800507
EFEFF7FF
6C0004D8
EB2AAAAA
C7800641
B7D55555
C2000305
31999999
13000052
BE666666
54000458
B7FFFB39
30000522
11BC6FFF
A080026F
BE000009
82000675
4ACF2000
BC800339
97FFFFFF
04800681
24800000
05800287
DF872000
F267D000
C0300000
2C643000
FEEFFFF7
7C20C000
42AAAAAA
6749E000
CF555555
A88B4000
A9199999
9290A000
0EFFFFF5
8FA8D000
BB9F4FFF
F0EA7000
558004C5
219BB000
E21DC000
DC313000
4EFFFFFF
3D725000
FB800000
5EB3E000
EFD80480
C27FFFFF
82080008
3C7FFFFF
AD3FFDDB
51FFFFFF
497FFFFE
A6FFFFFF
A16DFFFF
DBFFFFFF
392AAAAA
A07FFFFF
AD666666
537FFFFF
39DAEA49
54FFFFFF
E67FFCB0
2F7FFFFF
12EDEFFF
507FFFFF
CD000180
A17FFFFF
79D75000
C27FFFFF
52800000
FE7FFFFF
EE800080
B7000000
90100200
CC800000
BE7DFBFD
61800000
DD7FDFFF
C7000000
90AAAAAA
3F800000
3DD55555
60800000
C9999999
1B800000
64666666
4C000000
1DFFF953
35000000
EA3C0FFF
EF800000
D1267000
61800000
CA000000
7D800000
1A6AA41C
B42C29BF
6E11A944
D480452D
105640A2
87959BD1
7422DC0D
6D0E43AA
55ACBB94
1D06D383
C80AA52A
5CE3851F
099B7C06
0397C72F
89D5073F
A25265A2
4476B3FE
1704BD92
4581E9FD
3461D127
3B815934
1E1EA9BE
D74C0685
CCE34EAC
3E52F9BF
30420D4A
B9EA85EB
8C3D45DE
A162F6DA
9459E167
9C29EA20
C8F796B1
C5BD9F02
B7168D05
F736D700
3946C8CE
E20B8F25
BC967BD2
61BAC3AA
5205DABA
257338C7
AC30E694
13C5DD3F
2307A194
0B9317AE
AC534B2B
99271710
47103D91
E5F8F6BF
C7800000
F145F346
63400000
C91F2994
97C00000
C1B8E7D5
18D40000
F5844BAD
6AFA0000
ACF90D5F
5D794000
DE3632EC
49804000
DDBE13CC
6F5B4000
204C6402
4E962400
EC2773F0
34935000
75FBB2EC
DCAE2600
76F79C0E
BADC18F0
00000000
A01F7AF0
1FACEE1B
30D0EFBD
回答1:
The algorithm you are showing is trying to evaluate the reciprocal of the divisor D then multiply by dividend N, or in other words single(N * single(1/D))
.
This will invariably fail for a set of bad operands. For example, take N=3 and D=7 and look at these results:
0.14285714924335479736328125 = float(1/7) = closest 32bits float to (1/7)
0.42857144773006439208984375 = 3*float(1/7) = 3 times closest float to (1/7)
0.4285714626312255859375 = float(3*float(1/7) = closest float to above number
0.4285714328289031982421875 = float(3/7) = closest float to (3/7)
By virtue of IEEE 754 properties, float(float(3)/float(7)) is closest float to 3/7.
So if your inner loop perfectly finds the closest float to the inverse of 7, you can't simply reconstruct the closest float to 3/7 by multiplying it by 3 in single precision...
We have an overestimate here:
float(float(3)*float(1/7)) > float(3/7) > (3/7)
But if we take the next representable float before float(1/7), we have an underestimate:
float(3/7) > (3/7) > float(float(3)*predecessor(float(1/7)))
So, there is more: we just shown that there is not any float Q such that float(float(3)*Q) = float(float(3)/float(7))...
Even, if your inner reciprocal loop were imperfect, the outer single(N*reciprocal(D))
will still fail...
EDIT: How to proceed then?
Once you get a reasonnable approximate of the reciprocal, one possible way to correctly compute the division is to compute the reconstruction error and compensate it like demonstrated with this pseudo-code, but you'll need to emulate the fused-multiply-add:
z := float(1/y) ; // Estimate of the reciprocal - i.e. with your inner loop
q := float(x * z); // Estimate of the quotient
r := float(x - y*q);// Estimate of the error using a FMA
return float(q + r*z); // correct the error using another FMA
That's how I did it in my Smalltalk implementation of arbitrary precision floating point (https://code.google.com/p/arbitrary-precision-float/).
You'll find other algorithms on the Internet.
One reference I like about division is:
Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance - Nicolas Brisebarre, Jean-Michel Muller, Member, IEEE, and Saurabh Kumar Raina - http://perso.ens-lyon.fr/jean-michel.muller/DivIEEETC-aug04.pdf
回答2:
I think your answer is within computational accuracy of single precision floating point numbers. The maximum precision for floating point numbers is only 7 digits (for info check the wikipedia article or the IEEE-754 standard)
Single precision
http://en.wikipedia.org/wiki/Single_precision
You can check the accuracy by dividing the answers (not subtracting them), accuracy is not in terms of magnitude but in digits in exponential notation.
-340282346638528860000000000000000000000.00000 / -340282306073709650000000000000000000000.00000 = 1.000000119209311
An error of about ~1e-7 thus in the range of machine accuracy for single floating point numbers.
Or look at the number of matching decimal digits which is 7 which is the limit for single precision floating point:
3402823 46638528860000000000000000000000.00000
3402823 06073709650000000000000000000000.00000
Double precision
If you go to double precision (64-bit doubles) then the accuracy increases to approximately 16 significant digits.
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
Accumulated machine accuracy (AMA)
Accumulated machine accuracy is something we used to describe the effect of round-off errors stacking. Assume that every single precision number has the exact answer plus some small error eps (epsilon, the machine accuracy for a single operation). Now sum A/B + C/D. First of all the numbers A, B, C, and D are often not represented exactly by floating point operations and can contain an error as large as eps. Secondly the operation / could introduce and error of eps. Now let the answer to A/B be E plus some again a small error eps, and F the answer to C/D plus some small error eps. This means that A/B + C/D = E+eps + F+eps = E+F+2eps.
Of course sometimes the 'in between' answer (such as E and F) are an overestimate and underestimate thus cancelling the errors a bit. I always use the rule of thumb that during iterative schemes (especially in FEM, FVM, or VOF) the maximum achievable accuracy (pretty certain, no guarantee), is the number elements in your sum times eps.
In other words: your code looks fine ;)
回答3:
Doen't Wikipedia suggest using a pair of fused-multiply-add?
You might try and emulate fused-multiply-add with double precision
function y=fmadd(a,x,b)
y = single(a*x+b)
And use it with:
divisor_scaled = single(divisor_scaled); % put this out of the loop
% ...snip...
xi = fmadd( fmadd(-divisor_scaled,xi,1) , xi , xi);
EDIT
As explained in my other answer, the Newton Raphson iteration is there to get a close approximation of the reciprocal 1/D
, but this is not sufficient for producing an exactly rounded division N/D
by simply multiplying this reciprocal approximation by N, further steps are required.
So the FMA I proposed certainly won't make the whole division work as it is programmed now.
But it would be worth to test that it indeed helps getting an exactly rounded approximation of the reciprocal, because this is required by some division algorithms like the one I proposed in my other answer.
If it does, then there are ways to emulate the FMA with only single precision + - and *
来源:https://stackoverflow.com/questions/24792966/error-using-newton-raphson-iteration-method-for-floating-point-division