问题
I recently got thinking about alignment... It's something that we don't ordinarily have to consider, but I've realized that some processors require objects to be aligned along 4-byte boundaries. What exactly does this mean, and which specific systems have alignment requirements?
Suppose I have an arbitrary pointer:
unsigned char* ptr
Now, I'm trying to retrieve a double value from a memory location:
double d = **((double*)ptr);
Is this going to cause problems?
回答1:
It can definitely cause problems on some systems.
For example, on ARM-based systems you cannot address a 32-bit word that is not aligned to a 4-byte boundary. Doing so will result in an access violation exception. On x86 you can access such non-aligned data, though the performance suffers a little since two words have to fetched from memory instead of just one.
回答2:
Here's what the Intel x86/x64 Reference Manual says about alignments:
4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords
Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.
Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory.
Don't forget, reference manuals are the ultimate source of information of the responsible developer and engineer, so if you're dealing with something well documented such as Intel CPUs, just look up what the reference manual says about the issue.
回答3:
Alignment affects the layout of structs. Consider this struct:
struct S {
char a;
long b;
};
On a 32-bit CPU the layout of this struct will often be:
a _ _ _ b b b b
The requirement is that a 32-bit value has to be aligned on a 32-bit boundary. If the struct is changed like this:
struct S {
char a;
short b;
long c;
};
the layout will be this:
a _ b b c c c c
The 16-bit value is aligned on a 16-bit boundary.
Sometimes you want to pack the structs perhaps if you want to match the struct with a data format. By using a compiler option or perhaps a #pragma
you are able to remove the excess space:
a b b b b
a b b c c c c
However, accessing an unaligned member of a packed struct will often be much slower on modern CPU's, or may even result in an exception.
回答4:
Yes, that can cause a number of problems. The C++ standard doesn't actually guarantee that it'll work. You can't just arbitrarily cast between pointer types.
When you cast a char pointer to a double pointer, it uses a reinterpret_cast
, which applies an implementation-defined mapping. You're not guaranteed that the resulting pointer will contain the same bit pattern, or that it will point to the same address or, well, anything else. In more practical terms, you're also not guaranteed that the value you're reading is aligned properly. If the data was written as a series of chars, then they will use char's alignment requirements.
As for what alignment means, essentially just that the starting address of the value should be divisible by the alignment size. Address 16 is aligned on 1, 2, 4, 8 and 16-byte boundaries, for example, so on typical CPU's, values of these sizes can be stored there.
Address 6 isn't aligned on a 4-byte boundary, so we should not store 4-byte values there.
It's worth noting that even on CPU's that don't enforce or require alignment, you typically still get a significant slowdown from accessing unaligned values.
回答5:
Yes, that could cause problems.
4-alignment simply means that the pointer, when considered as a numeric address, is a multiple of 4. If the pointer is not a multiple of the required alignment, then it is unaligned. There are two reasons why compilers place alignment restrictions on certain types:
- Because the hardware cannot load that datatype from an unaligned pointer (at least, not using the instructions which the compiler wants to emit for loads and stores).
- Because the hardware loads that datatype more quickly from aligned pointers.
If you're in case (1), and double is 4-aligned, and you try your code with a char *
pointer which is not 4-aligned, then you'll most likely get a hardware trap. Some hardware does not trap. It just loads a nonsense value and continues. However, the C++ standard doesn't define what can happen (undefined behavior), so this code could set your computer on fire.
On x86, you're never in case (1), because the standard load instructions can handle unaligned pointers. On ARM, there are no unaligned loads, and if you attempt one then your program crashes (if you're lucky. Some ARMs silently fail).
Coming back to your example, the question is why you're trying this with a char *
that isn't 4-aligned. If you successfully wrote a double there via a double *
, then you'll be able to read it back. So if you originally had a "proper" pointer to double, which you cast to char *
and you're now casting back, you don't have to worry about alignment.
But you said arbitrary char *
, so I guess that's not what you have. If you read a chunk of data out of a file, which contains a serialized double, then you must ensure that that the alignment requirements for your platform are met in order to do this cast. If you have 8 bytes representing a double in some file format, then you cannot just read it willy-nilly into a char* buffer at any offset and then cast to double *
.
The easiest way to do this is to make sure that you read the file data into a suitable struct. You're also helped by the fact that memory allocations are always aligned to the maximum alignment requirement of any type they're big enough to contain. So if you allocate a buffer big enough to contain a double, then the start of that buffer has whatever alignment is required by double. So then you can read the 8 bytes representing the double into the start of the buffer, cast (or use a union) and read the double out.
Alternatively, you could do something like this:
double readUnalignedDouble(char *un_ptr) {
double d;
// either of these
std::memcpy(&d, un_ptr, sizeof(d));
std::copy(un_ptr, un_ptr + sizeof(d), reinterpret_cast<char *>(&d));
return d;
}
This is guaranteed to be valid (assuming un_ptr really points to the bytes of a valid double representation for your platform), because double is POD and hence can be copied byte-by-byte. It may not be the fastest solution, if you have a lot of doubles to load.
If you are reading from a file, there's actually a bit more to it than that if you're worried about platforms with non-IEEE double representations, or with 9 bit bytes, or some other unusual properties, where there might be non-value bits in the stored representation of a double. But you didn't actually ask about files, I just made it up as an example, and in any case those platforms are much rarer than the issue you're asking about, which is for double to have an alignment requirement.
Finally, nothing at all to do with alignment, you also have strict aliasing to worry about if you got that char *
via a cast from a pointer which is not alias-compatible with double *
. Aliasing is valid between char *
itself and anything else, though.
回答6:
On the x86 it's always going to run, of course more efficiently when aligned.
But if you're MULTITHREADING then watch for read-write-tearing. With a 64-bit value you need an x64 machine to give you atomic read-and-write between threads.
If say you read the value from another thread when it's say incrementing between 0x00000000.FFFFFFFF and 0x00000001.00000000, then another thread might in theory read say either 0 or 1FFFFFFFF, especially IF SAY the value STRADDLED A CACHE-LINE boundary.
I recommend Duffy's "Concurrent Programming on Windows" for its nice discussion of memory models, even mentioning alignment gotchas on multiprocessors when dot-net does a GC. You want to stay away from the Itanium !
回答7:
SPARC (Solaris machines) is another architecture (at least some in times past) that will choke (give a SIGBUS error) if you try to use an unaligned value.
An addendum to Martin York, malloc also is aligned to the largest possible type, ie it's safe for everything, like 'new'. In fact, frequently 'new' just uses malloc.
回答8:
An example of aligment requirement is when using vectorization (SIMD) instructions. (It can be used without aligment but is much faster if you use a kind of instruction which requires alignment).
回答9:
Enforced memory alignment is much more common in RISC based architectures such as MIPS.
The main thinking for these types of processors, AFAIK, is really a speed issue.
RISC methodology was all about having a set of simple and fast instructions ( usually one memory cycle per instruction ). This does not mean necessarily that it has less instructions than a CISC processor, more that it has simpler, faster instructions.
Many MIPS processors, although 8 byte addressable would be word aligned ( 32-bits typically but not always) then mask off the appropriate bits.
The idea being that this is faster to do an aligned load + bit mask than than trying to do an unaligned load.
Typically ( and of course this really depends on chipset ), doing an un-aligned load would generate a bus error so RISC processors would offer an 'unaligned load/store' instruction but this would often be much slower than the corresponding aligned load/store.
Of course this still doesn't answer the question as to why they do this i.e what advantage does having memory word aligned give you?
I'm no hardware expert and I'm sure someone on here can give a better answer but my two best guesses are:
1. It can be much faster to fetch from the cache when word aligned because many caches are organised into cache-lines ( anything from 8 to 512 bytes ) and as cache memory is typically much more expensive than RAM, you want to make the most of it.
2. It may be much faster to access each memory address as it allows you to read through 'Burst Mode' ( i.e fetching the next sequential address before it's needed )
Note none of the above is strictly impossible with non-aligned stores, I'm guessing ( though I don't know ) that a lot of it comes down to hardware design choices and cost
来源:https://stackoverflow.com/questions/1237963/alignment-along-4-byte-boundaries