Are word-aligned loads faster than unaligned loads on x64 processors?

后端 未结 5 1728
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-12 06:57

Are loads of variables that are aligned on word boundaries faster than unaligned load operations on x86/64 (Intel/AMD 64 bit) processors?

A colleague of mine argues

5条回答
  •  迷失自我
    2021-01-12 07:45

    Unaligned loads/stores should never be used, but the reason is not performance. The reason is that the C language forbids them (both via the alignment rules and the aliasing rules), and they don't work on many systems without extremely slow emulation code - code which may also break the C11 memory model needed for proper behavior of multi-threaded code, unless it's done on a purely byte-by-byte level.

    As for x86 and x86_64, for most operations (except some SSE instructions), misaligned load and store are allowed, but that doesn't mean they're as fast as correct accesses. It just means the CPU does the emulation for you, and does it somewhat more efficiently than you could do yourself. As an example, a memcpy-type loop that's doing misaligned word-size reads and writes will be moderately slower than the same memcpy doing aligned access, but it will also be faster than writing your own byte-by-byte copy loop.

提交回复
热议问题