Detecting Aligned Memory requirement on target CPU

I'm currently trying to build a code which is supposed to work on a wide range of machines, from handheld pockets and sensors to big servers in data centers.

One of the (many) differences between these architectures is the requirement for aligned memory access.

Aligned memory access is not required on "standard" x86 CPU, but many other CPU need it and produce an exception if the rule is not respected.

Up to now, i've been dealing with it by forcing the compiler to be cautious on specific data accesses which are known to be risky, using the packed attribute (or pragma). And it works fine.

The problem is, the compiler is so cautious that a lot of performance is lost in the process.

Since performance is important, we would be better of to rewrite some portion of the code to specifically work on strict-aligned cpus. Such code would, on the other hand, be slower on cpus which support unaligned memory access (such as x86), so we want to use it only on cpus which require strict-aligned memory access.

And now the question : how to detect, at compile time, that the target architecture requires strict-aligned memory access ? (or the other way round)

No C implementation that I know of provides any preprocessor macro to help you figure this out. Since your code supposedly runs on a wide range of machines, I assume that you have access to a wide variety of machines for testing, so you can figure out the answer with a test program. Then you can write your own macro, something like below:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

Note that the speed of an unaligned access will depend on the boundaries which it crosses. For example, if the access crosses a 4k page boundary it will be much slower, and there may be other boundaries which cause it to be slower still. Even on x86, some unaligned accesses are not handled by the processor and are instead handled by the OS kernel. That is incredibly slow.

There is also no guarantee that a future (or current) implementation will not suddenly change the performance characteristics of unaligned accesses. This has happened in the past and may happen in the future; the PowerPC 601 was very forgiving of unaligned access but the PowerPC 603e was not.

Complicating things even further is the fact that the code you'd write to make an unaligned access would differ in implementation across platforms. For example, on PowerPC it's simplified by the fact that x << 32 and x >> 32 are always 0 if x is 32 bits, but on x86 you have no such luck.

Writing your code for strict memory alignment is a good idea anyway. Even on x86 systems which allow unaligned access, your unaligned reads/writes will cause two memory accesses and some performance will be lost. It's not difficult to write efficient code which works on all CPU architectures. The simple rule to remember is that the pointer must be aligned to the size of the object you're reading or writing. e.g. if writing a DWORD, then (dest_pointer & 3 == 0). Using a crutch such as "UNALIGNED_PTR" types will cause the compiler to generate inefficient code. If you've got a large amount of legacy code that must work immediately, then it makes sense to use the compiler to "fix" the situation, but if it's your code, then write it from the start to work on all systems.

来源：https://stackoverflow.com/questions/9336764/detecting-aligned-memory-requirement-on-target-cpu

标签

cpu-architecture

memory-alignment

predefined-macro