ieee-754 | 易学教程

Portable serialisation of IEEE754 floating-point values

阅读更多关于 Portable serialisation of IEEE754 floating-point values

I've recently been working on a system that needs to store and load large quantities of data, including single-precision floating-point values. I decided to standardise on network byte order for integers, and also decided to store floating point values in big-endian format, i.e.: |-- Byte 0 --| |-- Byte 1 -| Byte 2 Byte 3 # ####### # ####### ######## ######## Sign Exponent Mantissa 1b 8b, MSB first 23b, MSB first Ideally, I want to provide functions like htonl() and ntohl() , since I have already been using these for swabbing integers, and I also want to implement this in a way that has as

Reading 32 bit signed ieee 754 floating points from a binary file with python?

阅读更多关于 Reading 32 bit signed ieee 754 floating points from a binary file with python?

问题 I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF. How would I read from this file and interpret them correctly as floating point numbers? I tried using read(4) , but it automatically converts them to a string with ascii encoding. I also tried using bytearray but that only takes it in 1 byte at a time instead of 4 bytes at a time as I need. 回答1: struct.unpack('f', file

What is long double on x86-64?

阅读更多关于 What is long double on x86-64?

Someone told me that: Under x86-64, FP arithmetic is done with SSE, and therefore long double is 64 bits. But in the x86-64 ABI it says that: C type | sizeof | alignment | AMD64 Architecture long double | 16 | 16 | 80-bit extended (IEEE-754) See: amd64-abi.pdf and gcc says sizeof(long double) is 16 and gives FLT_DBL = 1.79769e+308 and FLT_LDBL = 1.18973e+4932 So I'm confused, how is long double 64 bit? I thought it is an 80-bit representation. Under x86-64, FP arithmetic is done with SSE, and therefore long double is 64 bits. That's what usually happens under x86-64 (where the presence of SSE

How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

阅读更多关于 How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

问题 Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was: inline uint32 WireFormatLite::EncodeFloat(float value) { union {float f; uint32 i;}; f = value; return i; } inline float WireFormatLite::DecodeFloat(uint32 value) { union {float f; uint32 i;}; i = value; return f; } inline uint64 WireFormatLite::EncodeDouble(double value) {

In binary notation, what is the meaning of the digits after the radix point “.”?

阅读更多关于 In binary notation, what is the meaning of the digits after the radix point “.”?

问题 I have this example on how to convert from a base 10 number to IEEE 754 float representation Number: 45.25 (base 10) = 101101.01 (base 2) Sign: 0 Normalized form N = 1.0110101 * 2^5 Exponent esp = 5 E = 5 + 127 = 132 (base 10) = 10000100 (base 2) IEEE 754: 0 10000100 01101010000000000000000 This makes sense to me except one passage: 45.25 (base 10) = 101101.01 (base 2) 45 is 101101 in binary and that's okay.. but how did they obtain the 0.25 as .01 ? 回答1: You can convert the part after the

The Double Byte Size in 32 bit and 64 bit OS

阅读更多关于 The Double Byte Size in 32 bit and 64 bit OS

Is there a difference in double size when I run my app on 32 and 64 bit environment? If I am not mistaken the double in 32 bit environment will take up 16 digits after 0, whereas the double in 64 bit will take up 32 bit, am I right? No, an IEEE 754 double-precision floating point number is always 64 bits. Similarly, a single-precision float is always 32 bits. If your question is about C# and/or .NET specifically (as your tag would indicate), all of the data type sizes are fixed, independent of your system architecture. This is the same as Java, but different from C and C++ where type sizes do

Why is Number.MAX_SAFE_INTEGER 9,007,199,254,740,991 and not 9,007,199,254,740,992?

阅读更多关于 Why is Number.MAX_SAFE_INTEGER 9,007,199,254,740,991 and not 9,007,199,254,740,992?

问题 ECMAScript 6's Number.MAX_SAFE_INTEGER supposedly represents the maximum numerical value JavaScript can store before issues arise with floating point precision. However it's a requirement that the number 1 added to this value must also be representable as a Number . Number.MAX_SAFE_INTEGER NOTE The value of Number.MAX_SAFE_INTEGER is the largest integer n such that n and n + 1 are both exactly representable as a Number value. The value of Number.MAX_SAFE_INTEGER is 9007199254740991 (2^53−1) .

What are the applications/benefits of an 80-bit extended precision data type?

阅读更多关于 What are the applications/benefits of an 80-bit extended precision data type?

Yeah, I meant to say 80-bit . That's not a typo... My experience with floating point variables has always involved 4-byte multiples, like singles (32 bit), doubles (64 bit), and long doubles (which I've seen refered to as either 96-bit or 128-bit). That's why I was a bit confused when I came across an 80-bit extended precision data type while I was working on some code to read and write to AIFF (Audio Interchange File Format) files : an extended precision variable was chosen to store the sampling rate of the audio track. When I skimmed through Wikipedia, I found the link above along with a

Convert ieee 754 float to hex with c - printf

阅读更多关于 Convert ieee 754 float to hex with c - printf

Ideally the following code would take a float in IEEE 754 representation and convert it into hexadecimal void convert() //gets the float input from user and turns it into hexadecimal { float f; printf("Enter float: "); scanf("%f", &f); printf("hex is %x", f); } I'm not too sure what's going wrong. It's converting the number into a hexadecimal number, but a very wrong one. 123.1443 gives 40000000 43.3 gives 60000000 8 gives 0 so it's doing something, I'm just not too sure what. Help would be appreciated James McNellis When you pass a float as an argument to a variadic function (like printf() ),

Does the C++ standard specify anything on the representation of floating point numbers?

阅读更多关于 Does the C++ standard specify anything on the representation of floating point numbers?

For types T for which std::is_floating_point<T>::value is true , does the C++ standard specify anything on the way that T should be implemented? For example, does T has even to follow a sign/mantissa/exponent representation? Or can it be completely arbitrary? From N3337: [basic.fundamental/8]: There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of