floating-point-precision

Ruby - Multiplication issue

人盡茶涼 提交于 2019-12-04 09:55:30
My output is like this - ruby-1.9.2-p290 :011 > 2.32 * 3 => 6.959999999999999 And I remember sometime back on another machine I had got it like.. 2.32 * 3 = 6 What is my mistake? Thanks a ton for reading this. :) If you really want to round down to an integer then just (3 * 2.32).to_i but I think that's unlikely. Usually you just want to format the slightly imprecise floating point number to something like this "%0.2f" % (3 * 2.32) => "6.96" If you really want to work with the exact representation then you can use BigDecimal . require 'BigDecimal' (3 * BigDecimal.new("2.32")).to_s("F") => "6

Java float is more precise than double?

强颜欢笑 提交于 2019-12-04 08:41:59
Code: class Main { public static void main (String[] args) { System.out.print("float: "); System.out.println(1.35f-0.00026f); System.out.print("double: "); System.out.println(1.35-0.00026); } } Output: float: 1.34974 double: 1.3497400000000002 ??? float got the right answer, but double is adding extra stuff from no where, Why?? Isn't double supposed to be more precise than float? A float is 4 bytes wide, whereas a double is 8 bytes wide. Check What Every Computer Scientist Should Know About Floating-Point Arithmetic Surely the double has more precision so it has slightly less rounding error.

simple floating-point numbers lose precision

雨燕双飞 提交于 2019-12-04 07:32:42
I'm using Delphi XE2 Update 3. There are precision issue with even the simplest of floating-point numbers (like 3.7 ). Given this code (a 32-bit console app): program Project1; {$APPTYPE CONSOLE} {$R *.res} uses System.SysUtils; var s: Single; d: Double; x: Extended; begin Write('Size of Single ----- '); Writeln(SizeOf(Single)); Write('Size of Double ----- '); Writeln(SizeOf(Double)); Write('Size of Extended --- '); Writeln(SizeOf(Extended)); Writeln; s := 3.7; d := 3.7; x := 3.7; Write('"s" is '); Writeln(s); Write('"d" is '); Writeln(d); Write('"x" is '); Writeln(x); Writeln; Writeln('Single

How to make numbers not be shown in scientific form?

我是研究僧i 提交于 2019-12-04 06:41:15
问题 I want to write an array of floating point numbers into files <?php $x=[0.000455,0.000123,0.00005690330203]; $fname='test.txt'; $str=''; foreach($x as $elem){ $str .= "$elem\n"; } file_put_contents($fname,$str); ?> but in the test.txt, I see 0.000455 0.000123 5.690330203E-5 I don't want the float point number to be shown in scientific/exponential form, I hope they keep the original form, besides, there are also large integers like 12430120340 so if I use special format for floating point

Is this a valid float comparison that accounts for a set number of decimal places?

偶尔善良 提交于 2019-12-04 02:44:20
I'm writing an extension method to compare two floats using a set number of decimal points (significant figures) to determine if they are equal instead of a tolerance or percentage difference. Looking through the other questions regarding float comparison I see complex implementations. Have I oversimplified or is this valid? /// <summary> /// Determines if the float value is equal to (==) the float parameter according to the defined precision. /// </summary> /// <param name="float1">The float1.</param> /// <param name="float2">The float2.</param> /// <param name="precision">The precision. The

Strange output when using float instead of double

妖精的绣舞 提交于 2019-12-03 20:47:42
Strange output when I use float instead of double #include <stdio.h> void main() { double p,p1,cost,cost1=30; for (p = 0.1; p < 10;p=p+0.1) { cost = 30-6*p+p*p; if (cost<cost1) { cost1=cost; p1=p; } else { break; } printf("%lf\t%lf\n",p,cost); } printf("%lf\t%lf\n",p1,cost1); } Gives output as expected at p = 3; But when I use float the output is a little weird. #include <stdio.h> void main() { float p,p1,cost,cost1=40; for (p = 0.1; p < 10;p=p+0.1) { cost = 30-6*p+p*p; if (cost<cost1) { cost1=cost; p1=p; } else { break; } printf("%f\t%f\n",p,cost); } printf("%f\t%f\n",p1,cost1); } Why is the

Is it possible to use extended precision (80-bit) floating point arithmetic in GHC/Haskell?

吃可爱长大的小学妹 提交于 2019-12-03 10:26:56
The standard Haskell's Double uses the standard double-precision arithmetic : data Double Double-precision floating point numbers. It is desirable that this type be at least equal in range and precision to the IEEE double-precision type. Does GHC/Haskell offer somewhere also the extended precision (80-bit) floating point numbers, perhaps using some external library? As chuff has pointed out, you might want to take a look a the numbers package on hackage. You can install it with cabal install numbers . Here is an example: import Data.Number.CReal -- from numbers main :: IO () main = putStrLn

How to safely floor or ceil a CGFloat to int?

天涯浪子 提交于 2019-12-03 06:40:27
问题 I often need to floor or ceil a CGFloat to an int , for calculation of an array index. The problem I permanently see with floorf(theCGFloat) or ceilf(theCGFloat) is that there can be troubles with floating point inaccuracies. So what if my CGFloat is 2.0f but internally it is represented as 1.999999999999f or something like that. I do floorf and get 1.0f , which is a float again. And yet I must cast this beast to int which may introduce another problem. Is there a best practice how to floor

How to use Gcc 4.6.0 libquadmath and __float128 on x86 and x86_64

☆樱花仙子☆ 提交于 2019-12-02 19:33:17
I have medium size C99 program which uses long double type (80bit) for floating-point computation. I want to improve precision with new GCC 4.6 extension __float128 . As I get, it is a software-emulated 128-bit precision math. How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources? My program have reading of full precision values with strtod , doing a lot of different operations on them (like +-*/ sin, cos, exp and other from <math.h> ) and printf -ing of them. PS:

How to safely floor or ceil a CGFloat to int?

我与影子孤独终老i 提交于 2019-12-02 19:13:45
I often need to floor or ceil a CGFloat to an int , for calculation of an array index. The problem I permanently see with floorf(theCGFloat) or ceilf(theCGFloat) is that there can be troubles with floating point inaccuracies. So what if my CGFloat is 2.0f but internally it is represented as 1.999999999999f or something like that. I do floorf and get 1.0f , which is a float again. And yet I must cast this beast to int which may introduce another problem. Is there a best practice how to floor or ceil a float to an int such that something like 2.0 would never accidentally get floored to 1 and