Are loads of variables that are aligned on word boundaries faster than unaligned load operations on x86/64 (Intel/AMD 64 bit) processors?
A colleague of mine argues
Unaligned 32 and 64 bit access is NOT cheap.
I did tests to verify this. My results on Core i5 M460 (64 bit) were as follows: fastest integer type was 32 bit wide. 64 bit alignment was slightly slower but almost the same. 16 bit alignment and 8 bit alignment were both noticeably slower than both 32 and 64 bit alignment. 16 bit being slower than 8 bit alignment. The by far slowest form of access was non aligned 32 bit access that was 3.5 times slower than aligned 32 bit access (fastest of them) and unaligned 32 bit access was even 40% slower than unaligned 64 bit access.
Results: https://github.com/mkschreder/align-test/blob/master/results-i5-64bit.jpg?raw=true Source code: https://github.com/mkschreder/align-test