long-double | 易学教程

Sum of two “np.longdouble”s yields big numerical error

阅读更多关于 Sum of two “np.longdouble”s yields big numerical error

问题 Good morning, I'm reading two numbers from a FITS file (representing the integer and floating point parts of a single number), converting them to long doubles (128 bit in my machine), and then summing them up. The result is not as precise as I would expect from using 128-bit floats. Here is the code: a_int = np.longdouble(read_header_key(fits_file, 'I')) print "I %.25f" % a_int, type(a_int) a_float = np.longdouble(read_header_key(fits_file, 'F')) print "F %.25f" % a_float, a_float.dtype a = a

long double math library implementations?

阅读更多关于 long double math library implementations?

问题 What are the available portable implementations of the C99 long double math library functions ( expl , cosl , logl , etc.), if any? I've looked in fdlibm (Sun-based), NetBSD (UCB-based), etc. sources and not seen them. 回答1: You should be able to see it in the Sun-based libraries (used in pretty much all the open C libraries I am aware of, including glibc and FreeBSD one). I generally prefer BSD code for math code (more readable IMO). See here for 80-bits (Intel) long double format. For a

Why would you use float over double, or double over long double?

阅读更多关于 Why would you use float over double, or double over long double?

问题 I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it. I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with

How to print binary representation of a long double as in computer memory?

阅读更多关于 How to print binary representation of a long double as in computer memory?

问题 I have to print the binary representation of a long double number for some reasons. I want to see the exact format as it remains in the computer memory. I went through the following questions where taking a union was the the solution. For float , alternate datatype was unsigned int as both are 32-bit. For double , it was unsigned long int as both are 64-bit. But in of long double , it is 96-bit/128-bit (depending on platform) which has no similar equivalent memory consumer. So, what would be

print long double on windows

阅读更多关于 print long double on windows

问题 I use Microsoft Visual Studio 2010 and Intel C++ Compiler XE 14.0. Project configuration is ‘x64’. I need to work with the long double data type. I added a compile option /Qlong_double and wrote a test example: long double ld = 2; long double res = std::sqrt(ld); printf("long double size: %i, value: %Lf", sizeof(res), res); printf("double value of res: %f", (double)res); Output: long double size: 16, value: 0,000000 double value of res: 1,414214 I found that the problem is that standard C

Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

阅读更多关于 Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

问题 For specifics I am talking about x87 PC architecture and the C compiler. I am writing my own interpreter and the reasoning behind the double datatype confuses me. Especially where efficiency is concerned. Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double ? And why has the hardware settled on an 80-bit double , since that is not aligned? What are the performance implications of each? I would like to use an 80-bit double for my default numeric

How do I force usage of long doubles with Cython?

阅读更多关于 How do I force usage of long doubles with Cython?

I apologize in advance for my poor knowledge of C: I use Python to code and have written a few modules with Cython using the standard C functions to effect a great increase in speed. However, I need a range higher than 1e308 (yes, you read it right), which is what I currently get by using the type double complex and the functions cexp and cabs . I tried to use the functions cexpl and cabsl , and declared my variables to be of type long double complex , but I still encounter overflows after 1e308 . This probably means that my compiler converts long doubles to doubles, is that right? But

x86-64 long double precision

阅读更多关于 x86-64 long double precision

What is the actual precision of long double on Intel 64-bit platforms? is it 80 bits padded to 128 or actual 128 bit? if former, besides going gmp, is there another option to achieve true 128 precision? x86-64 precision is the same as regular x86. Extended double is 80 bits, using the x87 ISA, with 6 padding bytes added. There is no 128-bit FP hardware. A software implementation of quad or extended quad precision might benefit from the x86-64 64x64 => 128 integer multiply instruction, though. I would recommend using MPFR . It is a more sophisticated multiple-precision floating point library

Why would you use float over double, or double over long double?

阅读更多关于 Why would you use float over double, or double over long double?

I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it. I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with different platforms, different OS versions, different compilers? Or are there specific moments in programming

Substitutions for Eigen::MatrixXd typedefs

阅读更多关于 Substitutions for Eigen::MatrixXd typedefs

What is the simplest way to replace all Eigen::MatrixXd s and Eigen::VectorXd s with Vectors and Matrices that have long double elements? Every basic floating point variable in my code is of type long double . Also, everytime I use a matrix or vector, I use the following typedefs. typedef Eigen::VectorXd Vec; typedef Eigen::MatrixXd Mat; What's the best thing to switch these typedefs to? What happens if I leave them as they are? Simply define your own typedefs based on Eigen's own global matrix typedefs . If you use Eigen::MatrixXd and fill it with elements of type long double , those values