Can someone provide an example were casting a pointer from one type to another fails due to mis-alignment?
In the comments to this answer, bothie states that doing s
If you read up on the Core I7 architecture (specifically, their optimization literature), Intel has actually put a TON of hardware in there to make misaligned memory accesses nearly free. As far as I can tell, only a misalignment that crosses a cache line boundary has any extra cost at all - and even then it is minimal. AMD also has very little trouble with misaligned accesses (cycle-wise) as far as I remember (it's been a while though).
For what it's worth, I did set that flag in eflags (the AC bit - alignment check) when I was getting carried away optimizing a project that I was working on. It turns out that windows is FULL of misaligned accesses - so many that I wasn't able to locate any misaligned memory accesses in our code, I was bombarded with so many misaligned accesses in libraries and windows code that I didn't have time to continue.
Perhaps we can learn that when CPUs make things free or very low cost, programmers WILL become complacent and do things that have a little extra overhead. Perhaps Intel's engineers did some of that investigation, and found that typical x86 desktop software does millions of misaligned accesses per second, so they put incredibly fast misaligned access hardware in CoreI7.
HTH
char *foo = "....";
foo++;
int *bar = (int *)foo;
The compiler would put foo on a word boundary, and then when you increment it it's at a word+1, which is invalid for a int pointer.
char *foo is probably aligned to int boundaries. Try this:
int bar = *(int *)(foo + 1);
There is an additional condition, not mentioned, for EFLAGS.AC to actually take effect. CR0.AM must be set to prevent INT 17h from tripping on older OSes predating the 486 that have no handler for this exception. Unfortunately, Windows do not set it by default, you need to write a kernel-mode driver to set it.
The situations are uncommon where unaligned access will cause problems on an x86 (beyond having the memory access take longer). Here are some of the ones I've heard about:
You might not count this as x86 issue, but SSE operations benefit from alignment. Aligned data can be used as a memory source operand to save instructions. Unaligned-load instructions like movups
are slower than movaps
on microarchitectures before Nehalem, but on Nehalem and later (and AMD Bulldozer-family), unaligned 16-byte loads/stores are about as efficient as unaligned 8-byte loads/stores; single uop and no penalty at all if the data happens to be aligned at runtime or doesn't cross a cache-line boundary, otherwise efficient hardware support for cache-line splits. 4k splits are very expensive (~100 cycles) until Skylake (down to ~10 cycles like a cache line split). See https://agner.org/optimize/ and performance links in the x86 tag wiki for more info.
interlocked operations (like lock add [mem], eax
) are very slow if they aren't sufficiently aligned, especially if they cross a cache-line boundary so they can't just use a cache-lock inside the CPU core. On older (buggy) SMP systems, they might actually fail to be atomic (see https://blogs.msdn.com/oldnewthing/archive/2004/08/30/222631.aspx).
and another possibility discussed by Raymond Chen is when dealing with devices that have hardware banked memory (admittedly an oddball situation) - https://blogs.msdn.com/oldnewthing/archive/2004/08/27/221486.aspx
I recall (but don't have a reference for - so I'm not sure about this one) similar problems with unaligned accesses that straddle page boundaries that also involve a page fault. I'll see if I can dig up a reference for this.
And I learned something new when looking into this question (I was wondering about the "$ps |= (1<<18)
" GDB command that was mentioned in a couple places). I didn't realize that x86 CPUs (starting with the 486 it seems) have the ability to cause an exception when a misaligned access is performed.
From Jeffery Richter's "Programming Applications for Windows, 4th Ed":
Let's take a closer look at how the x86 CPU handles data alignment. The x86 CPU contains a special bit flag in its EFLAGS register called the AC (alignment check) flag. By default, this flag is set to zero when the CPU first receives power. When this flag is zero, the CPU automatically does whatever it has to in order to successfully access misaligned data values. However, if this flag is set to 1, the CPU issues an INT 17H interrupt whenever there is an attempt to access misaligned data. The x86 version of Windows 2000 and Windows 98 never alters this CPU flag bit. Therefore, you will never see a data misalignment exception occur in an application when it is running on an x86 processor.
This was news to me.
Of course the big problem with misaligned accesses is that when you eventually go to compile the code for a non-x86/x64 processor you end up having to track down and fix a whole bunch of stuff, since virtually all other 32-bit or larger processors are sensitive to alignment issues.
#include <stdio.h>
int main(int argc, char **argv)
{
char c[] = "a";
printf("%d\n", *(int*)(c));
}
This gives me a SIGBUS after setting set $ps |= (1<<18)
in gdb, which apparently is thrown when address alignment is incorrect (amongst other reasons).
EDIT: It's fairly easy to raise SIGBUS:
int main(int argc, char **argv)
{
/* EDIT: enable AC check */
asm("pushf; "
"orl $(1<<18), (%esp); "
"popf;");
char c[] = "1234567";
char d[] = "12345678";
return 0;
}
Looking at main's disassembly in gdb:
Dump of assembler code for function main:
....
0x08048406 <main+34>: mov 0x8048510,%eax
0x0804840b <main+39>: mov 0x8048514,%edx
0x08048411 <main+45>: mov %eax,-0x10(%ebp)
0x08048414 <main+48>: mov %edx,-0xc(%ebp)
0x08048417 <main+51>: movl $0x34333231,-0x19(%ebp) <== BAM! SIGBUS
0x0804841e <main+58>: movl $0x38373635,-0x15(%ebp)
0x08048425 <main+65>: movb $0x0,-0x11(%ebp)
Anyhow, Christoph your test program fails under Linux raising a SIGBUS as it should. It's probably a Windows thing?
You can enable the Alignment Check bit in code using this snippet:
/* enable AC check */
asm("pushf; "
"orl $(1<<18), (%esp); "
"popf;");
Also, ensure that the flag was indeed set:
unsigned int flags;
asm("pushf; "
"movl (%%esp), %0; "
"popf; " : "=r"(flags));
fprintf(stderr, "%d\n", flags & (1<<18));