double-precision

How to set double precision in C++ on MacOSX?

僤鯓⒐⒋嵵緔 提交于 2019-12-10 19:45:25
问题 I'm trying to port _controlfp( _CW_DEFAULT, 0xffffffff ); from WIN32 to Mac OS X / Intel. I have absolutely no idea how to port this instruction... And you? Thanks! 回答1: Try section 8.6 of Gough's Introduction to GCC, which demonstrates the x86 FLDCW instruction. But it helps if you tell us why you need it — if you want your doubles to be IEEE-754 64-bit doubles, the easiest way is to compile with -msse -mfpmath=sse. 回答2: What precision elements are you controlling? According to Microsoft's

How to know if a double string is round-trip safe?

﹥>﹥吖頭↗ 提交于 2019-12-10 16:44:14
问题 I have a text representation of a double and want to know if it's safe to round-trip it to double and back. How do I know this if I also want to accept any kind of number-style of the input? Or how do I know if any precision is lost when a double-string is parsed with Double.Parse? Or how do I ToString a double to match the same format as another double-string? An answer to any of these questions would be a solution I think. 回答1: Use the R format specifier to convert the double to a string :

Understanding DBL_MAX

谁说胖子不能爱 提交于 2019-12-10 11:41:47
问题 I just read about the IEEE 754 standard in order to understand how single-precision and double-precision floating points are implemented. So I wrote this to check my understanding: #include <stdio.h> #include <float.h> int main() { double foo = 9007199254740992; // 2^53 double bar = 9007199254740993; // 2^53 + 1 printf("%d\n\n", sizeof(double)); // Outputs 8. Good printf("%f\n\n", foo); // 9007199254740992.000000. Ok printf("%f\n", bar); // 9007199254740992.000000. Ok because Mantissa is 52

Java - maximum loss of precision in one double addition/subtraction

拟墨画扇 提交于 2019-12-09 17:28:06
问题 Is it possible to establish, even roughly, what the maximum precision loss would be when dealing with two double values in java (adding/subtracting)? Probably the worst case scenario is when two numbers cannot be represented exactly, and then an operation is performed on them, which results in a value that also cannot be represented exactly. 回答1: Have a look at Math.ulp(double). The ulp of a double is the delta to the next highest value. For instance, if you add to numbers and one is smaller

Understanding double precision operations in C

*爱你&永不变心* 提交于 2019-12-08 12:23:16
问题 I would like to understand why this code: double r,d,rc; scanf("%lf %lf", &r, &d); rc = (r * r) - (d/2) * (d/2); printf("%.2f\n", M_PI * rc); returns more precise result than this one (without rc variable assignment): double r,d,rc; scanf("%lf %lf", &r, &d); printf("%.2f\n", M_PI * (r * r) - (d/2) * (d/2)); Another, related, question: why is n * n better than pow(n,2) ? 回答1: The first code sample computes: M_PI * ((r * r) - (d/2) * (d/2)); The second computes: (M_PI * (r * r)) - (d/2) * (d/2)

How are double-precision floating-point numbers converted to single-precision floating-point format?

放肆的年华 提交于 2019-12-08 09:06:55
问题 Converting numbers from double-precision floating-point format to single-precision floating-point format results in loss of precision. What's the algorithm used to achieve this conversion? Are numbers greater than 3.4028234e+38 or lesser than -3.4028234e+38 simply reduced to the respective limits? I feel that the conversion process is a bit more involved than this but I couldn't find documentation for it. 回答1: The most common floating-point formats are the binary floating-point formats

SQL Error - missing keyword

一笑奈何 提交于 2019-12-07 22:09:08
问题 Whats wrong with this query : I am getting following error SQL Error: ORA-00905: missing keyword 00905. 00000 - "missing keyword" it says error at 4th row. Please advise CREATE TABLE ORDERS ( ID INT NOT NULL, ord_date DATE, AMOUNT double, CUSTOMER_ID INT references CUSTOMERS(ID), PRIMARY KEY (ID) ); 回答1: You missed to add precision in double datatype CREATE TABLE ORDERS ( ID INT NOT NULL, ord_date DATE, AMOUNT double precision, CUSTOMER_ID INT references CUSTOMERS(ID), PRIMARY KEY (ID) );

C++: difference between 0. and 0.0?

╄→尐↘猪︶ㄣ 提交于 2019-12-07 12:53:13
问题 I am well aware of the difference between 0 and 0.0 (int and double). But is there any difference between 0. and 0.0 ( please note the . )? Thanks a lot in advance, Axel 回答1: There is no difference. Both literals are double. From the C++-Grammar: fractional-constant: digit-sequenceopt . digit-sequence digit-sequence . See: Hyperlinked C++ BNF Grammar 回答2: No, there is not. 回答3: No. You can also write .0 as far as I know. 回答4: Just having the . as part of the number identifies it as a floating

Understanding DBL_MAX

半世苍凉 提交于 2019-12-06 15:30:33
I just read about the IEEE 754 standard in order to understand how single-precision and double-precision floating points are implemented. So I wrote this to check my understanding: #include <stdio.h> #include <float.h> int main() { double foo = 9007199254740992; // 2^53 double bar = 9007199254740993; // 2^53 + 1 printf("%d\n\n", sizeof(double)); // Outputs 8. Good printf("%f\n\n", foo); // 9007199254740992.000000. Ok printf("%f\n", bar); // 9007199254740992.000000. Ok because Mantissa is 52 bits printf("%f\n\n", DBL_MAX); // ?? return 0; } Output: 8 9007199254740992.000000 9007199254740992

double precision integer subtraction with 32-bit registers(MIPS)

女生的网名这么多〃 提交于 2019-12-06 11:23:05
问题 I am learning computer arithmetic. The book I use(Patterson and Hennessey) lists the below question. Write mips code to conduct double precision integer subtraction for 64-bit data. Assume the first operand to be in registers $t4(hi) and $t5(lo), second in $t6(hi) and $t7(lo). My solution to the answer is sub $t3, $t5, $t7 # Subtract lo parts of operands. t3 = t5 - t7 sltu $t2, $t5, $t7 # If the lo part of the 1st operand is less than the 2nd, # it means a borrow must be made from the hi part