How to write a constexpr
function to swap endianess of an integer, without relying on compiler extensions and can you give an example on how to do it?
Inspired by ecatmur I suggest the following solution, that has potentially better performance when bswap is not detected by the compiler (O(log(n)) vs O(N)). Given that N is usually <=8 this is probably irrelevant, still:
template <typename T>
typename std::enable_if<std::is_unsigned<T>::value,T>::type
constexpr alternating_bitmask(const size_t step){
T mask(0);
for (size_t i=0;i<digits<T>();i+=2*step){
mask|=(~T(0)>>(digits<T>()-step))<<i;
}
return mask;
}
template <typename T>
typename std::enable_if<std::is_unsigned<T>::value,T>::type
constexpr bswap(T n){
for (size_t i=digits<unsigned char>();i<digits<T>();i*=2){
n = ((n&(~(alternating_bitmask<T>(i))))>>i)|
((n&( (alternating_bitmask<T>(i))))<<i);
}
return n;
}
As this form is more complex than ecatmur's solution the compiler has a harder job optimizing, but clang still finds out that we mean bswap.
Yes, it's pretty easy; here's a recursive (C++11-compatible) implementation (unsigned integral types only):
#include <climits>
#include <cstdint>
#include <type_traits>
template<class T>
constexpr typename std::enable_if<std::is_unsigned<T>::value, T>::type
bswap(T i, T j = 0u, std::size_t n = 0u) {
return n == sizeof(T) ? j :
bswap<T>(i >> CHAR_BIT, (j << CHAR_BIT) | (i & (T)(unsigned char)(-1)), n + 1);
}
Example.
Here I'm using j
as the accumulator and n
as the loop counter (indexing bytes).
If you have a compiler supporting C++17 fold expressions, it's possible to write something that expands out into exactly what you'd write by hand:
template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) {
return ((((i >> (N * CHAR_BIT)) & (T)(unsigned char)(-1)) <<
((sizeof(T) - 1 - N) * CHAR_BIT)) | ...);
}; // ^~~~~ fold expression
template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) {
return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}
The advantage of this form is that because it doesn't use loops or recursion, you're pretty much guaranteed to get optimal assembly output - on x86-64, clang even manages to work out to use the bswap instruction.