Detecting Aligned Memory requirement on target CPU

三世轮回 提交于 2019-12-04 12:39:29

No C implementation that I know of provides any preprocessor macro to help you figure this out. Since your code supposedly runs on a wide range of machines, I assume that you have access to a wide variety of machines for testing, so you can figure out the answer with a test program. Then you can write your own macro, something like below:

#if defined(__sparc__)
/* Unaligned access will crash your app on a SPARC */
#define ALIGN_ACCESS 1
#elif defined(__ppc__) || defined(__POWERPC__) || defined(_M_PPC)
/* Unaligned access is too slow on a PowerPC (maybe?) */
#define ALIGN_ACCESS 1
#elif defined(__i386__) || defined(__x86_64__) || \
      defined(_M_IX86) || defined(_M_X64)
/* x86 / x64 are fairly forgiving */
#define ALIGN_ACCESS 0
#else
#warning "Unsupported architecture"
#define ALIGN_ACCESS 1
#endif

Note that the speed of an unaligned access will depend on the boundaries which it crosses. For example, if the access crosses a 4k page boundary it will be much slower, and there may be other boundaries which cause it to be slower still. Even on x86, some unaligned accesses are not handled by the processor and are instead handled by the OS kernel. That is incredibly slow.

There is also no guarantee that a future (or current) implementation will not suddenly change the performance characteristics of unaligned accesses. This has happened in the past and may happen in the future; the PowerPC 601 was very forgiving of unaligned access but the PowerPC 603e was not.

Complicating things even further is the fact that the code you'd write to make an unaligned access would differ in implementation across platforms. For example, on PowerPC it's simplified by the fact that x << 32 and x >> 32 are always 0 if x is 32 bits, but on x86 you have no such luck.

Writing your code for strict memory alignment is a good idea anyway. Even on x86 systems which allow unaligned access, your unaligned reads/writes will cause two memory accesses and some performance will be lost. It's not difficult to write efficient code which works on all CPU architectures. The simple rule to remember is that the pointer must be aligned to the size of the object you're reading or writing. e.g. if writing a DWORD, then (dest_pointer & 3 == 0). Using a crutch such as "UNALIGNED_PTR" types will cause the compiler to generate inefficient code. If you've got a large amount of legacy code that must work immediately, then it makes sense to use the compiler to "fix" the situation, but if it's your code, then write it from the start to work on all systems.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!