I\'m working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little a
This seems to do the trick:
void encode_bigend_u64(uint64_t value, void* dest)
{
value =
((value & 0xFF00000000000000u) >> 56u) |
((value & 0x00FF000000000000u) >> 40u) |
((value & 0x0000FF0000000000u) >> 24u) |
((value & 0x000000FF00000000u) >> 8u) |
((value & 0x00000000FF000000u) << 8u) |
((value & 0x0000000000FF0000u) << 24u) |
((value & 0x000000000000FF00u) << 40u) |
((value & 0x00000000000000FFu) << 56u);
memcpy(dest, &value, sizeof(uint64_t));
}
-O3
encode_bigend_u64(unsigned long, void*):
bswapq %rdi
movq %rdi, (%rsi)
retq
-O3 -march=native
encode_bigend_u64(unsigned long, void*):
movbeq %rdi, (%rsi)
retq
-O3
encode_bigend_u64(unsigned long, void*):
bswap %rdi
movq %rdi, (%rsi)
ret
-O3 -march=native
encode_bigend_u64(unsigned long, void*):
movbe %rdi, (%rsi)
ret
Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native
) but I strongly suspect a recent x86_64 processor)
If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if
. Both the union and the pointer casts versions work and are optimized by both gcc
and clang
resulting in the exact same assembly (no branches). Full code on godebolt:
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} bint = {0x01020304};
return bint.c[0] == 1;
}
void encode_bigend_u64_union(uint64_t value, void* dest)
{
if (!is_big_endian())
//...
memcpy(dest, &value, sizeof(uint64_t));
}
Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):
MOVBE—Move Data After Swapping Bytes
Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]
The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.
All functions in this answer with asm output on the Godbolt Compiler Explorer
GNU C has a uint64_t __builtin_bswap64 (uint64_t x), since GNU C 4.3. This is apparently the most reliable way to get gcc / clang to generate code that doesn't suck for this.
glibc provides htobe64
, htole64
, and similar host to/from BE and LE functions that swap or not, depending on the endianness of the machine. See the docs for <endian.h>. The man page says they were added to glibc in version 2.9 (released 2008-11).
#define _BSD_SOURCE /* See feature_test_macros(7) */
#include <stdint.h>
#include <endian.h>
// ideal code with clang from 3.0 onwards, probably earlier
// ideal code with gcc from 4.4.7 onwards, probably earlier
uint64_t load_be64_endian_h(const uint64_t *be_src) { return be64toh(*be_src); }
movq (%rdi), %rax
bswap %rax
void store_be64_endian_h(uint64_t *be_dst, uint64_t data) { *be_dst = htobe64(data); }
bswap %rsi
movq %rsi, (%rdi)
// check that the compiler understands the data movement and optimizes away a double-conversion (which inline-asm `bswap` wouldn't)
// it does optimize away with gcc 4.9.3 and later, but not with gcc 4.9.0 (2x bswap)
// optimizes away with clang 3.7.0 and later, but not clang 3.6 or earlier (2x bswap)
uint64_t double_convert(uint64_t data) {
uint64_t tmp;
store_be64_endian_h(&tmp, data);
return load_be64_endian_h(&tmp);
}
movq %rdi, %rax
You safely get good code even at -O1
from those functions, and they use movbe
when -march
is set to a CPU that supports that insn.
If you're targeting GNU C, but not glibc, you can borrow the definition from glibc (remember it's LGPLed code, though):
#ifdef __GNUC__
# if __GNUC_PREREQ (4, 3)
static __inline unsigned int
__bswap_32 (unsigned int __bsx) { return __builtin_bswap32 (__bsx); }
# elif __GNUC__ >= 2
// ... some fallback stuff you only need if you're using an ancient gcc version, using inline asm for non-compile-time-constant args
# endif // gcc version
#endif // __GNUC__
If you really need a fallback that might compile well on compilers that don't support GNU C builtins, the code from @bolov's answer could be used to implement a bswap that compiles nicely. Pre-processor macros could be used to choose whether to swap or not (like glibc does), to implement host-to-BE and host-to-LE functions. The bswap used by glibc when __builtin_bswap
or x86 asm isn't available uses the mask-and-shift idiom that bolov found was good. gcc recognizes it better than just shifting.
The code from this Endian-agnostic coding blog post compiles to bswap with gcc, but not with clang. IDK if there's anything that both their pattern-recognizers will recognize.
// Note that this is a load, not a store like the code in the question.
uint64_t be64_to_host(unsigned char* data) {
return
((uint64_t)data[7]<<0) | ((uint64_t)data[6]<<8 ) |
((uint64_t)data[5]<<16) | ((uint64_t)data[4]<<24) |
((uint64_t)data[3]<<32) | ((uint64_t)data[2]<<40) |
((uint64_t)data[1]<<48) | ((uint64_t)data[0]<<56);
}
## gcc 5.3 -O3 -march=haswell
movbe (%rdi), %rax
ret
## clang 3.8 -O3 -march=haswell
movzbl 7(%rdi), %eax
movzbl 6(%rdi), %ecx
shlq $8, %rcx
orq %rax, %rcx
... completely naive implementation
The htonll
from this answer compiles to two 32bit bswap
s combined with shift/or. This kind of sucks, but isn't terrible with either gcc or clang.
I didn't have any luck with a union { uint64_t a; uint8_t b[8]; }
version of the OP's code. clang still compiles it to a 64bit bswap
, but I think compiles to even worse code with gcc. (See the godbolt link).
I like Peter's solution, but here's something else you can use on Haswell. Haswell has the movbe
instruction, which is 3 uops there (no cheaper than bswap r64
+ a normal load or store), but is faster on Atom / Silvermont (https://agner.org/optimize/):
// AT&T syntax, compile without -masm=intel
inline
uint64_t load_bigend_u64(uint64_t value)
{
__asm__ ("movbe %[src], %[dst]" // x86-64 only
: [dst] "=r" (value)
: [src] "m" (value)
);
return value;
}
Use it with something like uint64_t tmp = load_bigend_u64(array[i]);
You could reverse this to make a store_bigend
function, or use bswap
to modify a value in a register and let the compiler load/store it.
I change the function to return value
because alignment of vdest
was not clear to me.
Usually a feature is guarded by a preprocessor macro. I'd expect __MOVBE__
to be used for the movbe
feature flag, but its not present (this machine has the feature):
$ gcc -march=native -dM -E - < /dev/null | sort
...
#define __LWP__ 1
#define __LZCNT__ 1
#define __MMX__ 1
#define __MWAITX__ 1
#define __NO_INLINE__ 1
#define __ORDER_BIG_ENDIAN__ 4321
...