long-double

Sum of two “np.longdouble”s yields big numerical error

北城余情 提交于 2019-12-10 21:35:32
问题 Good morning, I'm reading two numbers from a FITS file (representing the integer and floating point parts of a single number), converting them to long doubles (128 bit in my machine), and then summing them up. The result is not as precise as I would expect from using 128-bit floats. Here is the code: a_int = np.longdouble(read_header_key(fits_file, 'I')) print "I %.25f" % a_int, type(a_int) a_float = np.longdouble(read_header_key(fits_file, 'F')) print "F %.25f" % a_float, a_float.dtype a = a

long double math library implementations?

谁说胖子不能爱 提交于 2019-12-10 04:20:40
问题 What are the available portable implementations of the C99 long double math library functions ( expl , cosl , logl , etc.), if any? I've looked in fdlibm (Sun-based), NetBSD (UCB-based), etc. sources and not seen them. 回答1: You should be able to see it in the Sun-based libraries (used in pretty much all the open C libraries I am aware of, including glibc and FreeBSD one). I generally prefer BSD code for math code (more readable IMO). See here for 80-bits (Intel) long double format. For a

Why would you use float over double, or double over long double?

时光毁灭记忆、已成空白 提交于 2019-12-09 08:21:47
问题 I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it. I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with

How to print binary representation of a long double as in computer memory?

陌路散爱 提交于 2019-12-08 13:59:53
问题 I have to print the binary representation of a long double number for some reasons. I want to see the exact format as it remains in the computer memory. I went through the following questions where taking a union was the the solution. For float , alternate datatype was unsigned int as both are 32-bit. For double , it was unsigned long int as both are 64-bit. But in of long double , it is 96-bit/128-bit (depending on platform) which has no similar equivalent memory consumer. So, what would be

print long double on windows

↘锁芯ラ 提交于 2019-12-08 12:17:45
问题 I use Microsoft Visual Studio 2010 and Intel C++ Compiler XE 14.0. Project configuration is ‘x64’. I need to work with the long double data type. I added a compile option /Qlong_double and wrote a test example: long double ld = 2; long double res = std::sqrt(ld); printf("long double size: %i, value: %Lf", sizeof(res), res); printf("double value of res: %f", (double)res); Output: long double size: 16, value: 0,000000 double value of res: 1,414214 I found that the problem is that standard C

Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

删除回忆录丶 提交于 2019-12-07 06:27:10
问题 For specifics I am talking about x87 PC architecture and the C compiler. I am writing my own interpreter and the reasoning behind the double datatype confuses me. Especially where efficiency is concerned. Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double ? And why has the hardware settled on an 80-bit double , since that is not aligned? What are the performance implications of each? I would like to use an 80-bit double for my default numeric

How do I force usage of long doubles with Cython?

北城余情 提交于 2019-12-05 18:42:18
I apologize in advance for my poor knowledge of C: I use Python to code and have written a few modules with Cython using the standard C functions to effect a great increase in speed. However, I need a range higher than 1e308 (yes, you read it right), which is what I currently get by using the type double complex and the functions cexp and cabs . I tried to use the functions cexpl and cabsl , and declared my variables to be of type long double complex , but I still encounter overflows after 1e308 . This probably means that my compiler converts long doubles to doubles, is that right? But

x86-64 long double precision

亡梦爱人 提交于 2019-12-04 05:36:53
What is the actual precision of long double on Intel 64-bit platforms? is it 80 bits padded to 128 or actual 128 bit? if former, besides going gmp, is there another option to achieve true 128 precision? x86-64 precision is the same as regular x86. Extended double is 80 bits, using the x87 ISA, with 6 padding bytes added. There is no 128-bit FP hardware. A software implementation of quad or extended quad precision might benefit from the x86-64 64x64 => 128 integer multiply instruction, though. I would recommend using MPFR . It is a more sophisticated multiple-precision floating point library

Why would you use float over double, or double over long double?

故事扮演 提交于 2019-12-03 09:49:11
I'm still a beginner at programming and I always have more questions than our book or internet searches can answer (unless I missed something). So I apologize in advance if this was answered but I couldn't find it. I understand that float has a smaller range than double making it less precise, and from what I understand, long double is even more precise(?). So my question is why would you want to use a variable that is less precise in the first place? Does it have something to do with different platforms, different OS versions, different compilers? Or are there specific moments in programming

Substitutions for Eigen::MatrixXd typedefs

孤者浪人 提交于 2019-12-02 03:20:12
What is the simplest way to replace all Eigen::MatrixXd s and Eigen::VectorXd s with Vectors and Matrices that have long double elements? Every basic floating point variable in my code is of type long double . Also, everytime I use a matrix or vector, I use the following typedefs. typedef Eigen::VectorXd Vec; typedef Eigen::MatrixXd Mat; What's the best thing to switch these typedefs to? What happens if I leave them as they are? Simply define your own typedefs based on Eigen's own global matrix typedefs . If you use Eigen::MatrixXd and fill it with elements of type long double , those values