I was going through this example which has a function outputting a hex bit pattern to represent an arbitrary float.
void ExamineFloat(float fValue)
{
pr
(unsigned long)fValue
This converts the float
value to an unsigned long
value, according to the "usual arithmetic conversions".
*(unsigned long *)&fValue
The intention here is to take the address at which fValue
is stored, pretend that there is not a float
but an unsigned long
at this address, and to then read that unsigned long
. The purpose is to examine the bit pattern which is used to store the float
in memory.
As shown, this causes undefined behavior though.
Reason: You may not access an object through a pointer to a type that is not "compatible" to the object's type. "Compatible" types are for example (unsigned
) char
and every other type, or structures that share the same initial members (speaking of C here). See §6.5/7 N1570 for the detailed (C11) list (Note that my use of "compatible" is different - more broad - than in the referenced text.)
Solution: Cast to unsigned char *
, access the individual bytes of the object and assemble an unsigned long
out of them:
unsigned long pattern = 0;
unsigned char * access = (unsigned char *)&fValue;
for (size_t i = 0; i < sizeof(float); ++i) {
pattern |= *access;
pattern <<= CHAR_BIT;
++access;
}
Note that (as @CodesInChaos pointed out) the above treats the floating point value as being stored with its most significant byte first ("big endian"). If your system uses a different byte order for floating point values you'd need to adjust to that (or rearrange the bytes of above unsigned long
, whatever's more practical to you).
Floating-point values have memory representations: for example the bytes can represent a floating-point value using IEEE 754.
The first expression *(unsigned long *)&fValue
will interpret these bytes as if it was the representation of an unsigned long
value. In fact in C standard it results in an undefined behavior (according to the so-called "strict aliasing rule"). In practice, there are issues such as endianness that have to be taken into account.
The second expression (unsigned long)fValue
is C standard compliant. It has a precise meaning:
C11 (n1570), § 6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than
_Bool
, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Typecasting in C does both a type conversion and a value conversion. The floating point → unsigned long conversion truncates the fractional portion of the floating point number and restricts the value to the possible range of an unsigned long. Converting from one type of pointer to another has no required change in value, so using the pointer typecast is a way to keep the same in-memory representation while changing the type associated with that representation.
In this case, it's a way to be able to output the binary representation of the floating point value.
As others have already noted, casting a pointer to a non-char type to a pointer to a different non-char type and then dereferencing is undefined behavior.
That printf("%08lx\n", *(unsigned long *)&fValue)
invokes undefined behavior does not necessarily mean that running a program that attempts to perform such a travesty will result in hard drive erasure or make nasal demons erupt from ones nose (the two hallmarks of undefined behavior). On a computer in which sizeof(unsigned long)==sizeof(float)
and on which both types have the same alignment requirements, that printf
will almost certainly do what one expects it to do, which is to print the hex representation of the floating point value in question.
This shouldn't be surprising. The C standard openly invites implementations to extend the language. Many of these extensions are in areas that are, strictly speaking, undefined behavior. For example, the POSIX function dlsym returns a void*
, but this function is typically used to find the address of a function rather than a global variable. This means the void pointer returned by dlsym
needs to be cast to a function pointer and then dereferenced to call the function. This is obviously undefined behavior, but it nonetheless works on any POSIX compliant platform. This will not work on a Harvard architecture machine on which pointers to functions have different sizes than do pointers to data.
Similarly, casting a pointer to a float
to a pointer to an unsigned integer and then dereferencing happens to work on almost any computer with almost any compiler in which the size and alignment requirements of that unsigned integer are the same as that of a float
.
That said, using unsigned long
might well get you into trouble. On my computer, an unsigned long
is 64 bits long and has 64 bit alignment requirements. This is not compatible with a float. It would be better to use uint32_t
-- on my computer, that is.
The union hack is one way around this mess:
typedef struct {
float fval;
uint32_t ival;
} float_uint32_t;
Assigning to a float_uint32_t.fval
and accessing from a ``float_uint32_t.ival` used to be undefined behavior. That is no longer the case in C. No compiler that I know of blows nasal demons for the union hack. This was not UB in C++. It was illegal. Until C++11, a compliant C++ compiler had to complain to be compliant.
Any even better way around this mess is to use the %a
format, which has been part of the C standard since 1999:
printf ("%a\n", fValue);
This is simple, easy, portable, and there is no chance of undefined behavior. This prints the hexadecimal/binary representation of the double precision floating point value in question. Since printf
is an archaic function, all float
arguments are converted to double
prior to the call to printf
. This conversion must be exact per the 1999 version of the C standard. One can pick up that exact value via a call to scanf
or its sisters.
*(unsigned long *)&fValue
is not equivalent to a direct cast to an unsigned long
.
The conversion to (unsigned long)fValue
converts the value of fValue
into an unsigned long
, using the normal rules for conversion of a float
value to an unsigned long
value. The representation of that value in an unsigned long
(for example, in terms of the bits) can be quite different from how that same value is represented in a float
.
The conversion *(unsigned long *)&fValue
formally has undefined behaviour. It interprets the memory occupied by fValue
as if it is an unsigned long
. Practically (i.e. this is what often happens, even though the behaviour is undefined) this will often yield a value quite different from fValue
.