How do I byte-swap a signed number in C?

问题

I understand that casting from an unsigned type to a signed type of equal rank produces an implementation-defined value:

C99 6.3.1.3:

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

This means I don't know how to byte-swap a signed number. For instance, suppose I am receiving two-byte, twos-complement signed values in little-endian order from a peripheral device, and processing them on a big-endian CPU. The byte-swapping primitives in the C library (like ntohs) are defined to work on unsigned values. If I convert my data to unsigned so I can byte-swap it, how do I reliably recover a signed value afterward?

回答1:

I understand that casting from an unsigned type to a signed type of equal rank produces an implementation-defined value.

It will be implementation-defined only because the signedness format in C is implementation-defined. For example, two's complement is one such implementation-defined format.

So the only issue here is if either side of the transmission would not be two's complement, which is not likely going to happen in the real world. I would not bother to design programs to be portable to obscure, extinct one's complement computers from the dark ages.

This means I don't know how to byte-swap a signed number. For instance, suppose I am receiving two-byte, twos-complement signed values in little-endian order from a peripheral device, and processing them on a big-endian CPU

I suspect a source of confusion here is that you think a generic two's complement number will be transmitted from a sender that is either big or little endian and received by one which is either big/little. Data transmission protocols don't work like that though: they explicitly specify endianess and signedness format. So both sides have to adapt to the protocol.

And once that's specified, there's really no rocket science here: you are receiving 2 raw bytes. Store them in an array of raw data. Then assign them to your two's complement variable. Suppose the protocol specified little endian:

int16_t val;
uint8_t little[2];

val = (little[1]<<8) | little[0];

Bit shifting has the advantage of being endian-independent. So the above code will work no matter if your CPU is big or little. So although this code contains plenty of ugly implicit promotions, it is 100% portable. C is guaranteed to treat the above as this:

val = (int16_t)( ((int)((int)little[1]<<8)) | (int)little[0] );

The result type of the shift operator is that of its promoted left operand. The result type of | is the balanced type (usual arthmetic conversions).

Shifting signed negative numbers would give undefined behavior, but we get away with the shift because the individual bytes are unsigned. When they get implicitly promoted, the numbers are still treated as positive.

And since int is guaranteed to be at least 16 bits, the code will work on all CPUs.

Alternatively, you could use pedantic style that completely excludes all implicit promotions/conversions:

val = (int16_t) ( ((uint32_t)little[1] << 8) | (uint32_t)little[0] );

But this comes at the cost of readability.

回答2:

As you say in your question the result is implementation-defined or an implementation-defined signal is raised - i.e. depends on the platform/compiler what happens.

回答3:

To byte-swap a signed number while avoiding as much implementation-defined behavior as possible, you can make use of a wider signed intermediate, one that can represent the entire range of the unsigned type with the same width as the signed value you wanted to byte-swap. Taking your example of little-endian, 16-bit numbers:

// Code below assumes CHAR_BIT == 8, INT_MAX is at least 65536, and
// signed numbers are twos complement.
#include <stdint.h>

int16_t
sl16_to_host(unsigned char b[2])
{
    unsigned int n = ((unsigned int)b[0]) | (((unsigned int)b[1]) << 8);
    int v = n;
    if (n & 0x8000) {
        v -= 0x10000;
    }
    return (int16_t)v;
}

Here's what this does. First, it converts the little-endian value in b to a host-endian unsigned value (regardless of which endianness the host actually is). Then it stores that value in a wider, signed variable. Its value is still in the range [0, 65535], but it is now a signed quantity. Because int can represent all the values in that range, the conversion is fully defined by the standard.

Now comes the key step. We test the high bit of the unsigned value, which is the sign bit, and if it's true we subtract 65536 (0x10000) from the signed value. That maps the range [32768, 655535] to [-32768, -1], which is precisely how a twos-complement signed number is encoded. This is still happening in the wider type and therefore we are guaranteed that all the values in the range are representable.

Finally, we truncate the wider type to int16_t. This step involves unavoidable implementation-defined behavior, but with probability one, your implementation defines it to behave as you would expect. In the vanishingly unlikely event that your implementation uses sign-and-magnitude or ones-complement representation for signed numbers, the value -32768 will be mangled by the truncation, and may cause the program to crash. I wouldn't bother worrying about it.

Another approach, which may be useful for byteswapping 32-bit numbers when you don't have a 64-bit type available, is to mask out the sign bit and handle it separately:

int32_t
sl32_to_host(unsigned char b[4])
{
    uint32_t mag = ((((uint32_t)b[0]) & 0xFF) <<  0) |
                   ((((uint32_t)b[1]) & 0xFF) <<  8) |
                   ((((uint32_t)b[2]) & 0xFF) << 16) |
                   ((((uint32_t)b[3]) & 0x7F) << 24);
    int32_t val = mag;
    if (b[3] & 0x80) {
        val = (val - 0x7fffffff) - 1;
    }
    return val;
}

I've written (val - 0x7fffffff) - 1 here, instead of just val - 0x80000000, to ensure that the subtraction happens in a signed type.

来源：https://stackoverflow.com/questions/10435665/how-do-i-byte-swap-a-signed-number-in-c

标签

standards

unsigned

signed