Is 8-byte alignment for “double” type necessary?

前端未结

关注

 5  819

滥情空心

I understand word-alignment, which makes the cpu only need to read once when reading an integer into a register.

But is 8-byte alignment (let\'s assume 32bit system

相关标签:

5条回答

悲哀的现实

2021-01-20 03:57

I just found the answer:

"6. When memory reading is efficient in reading 4 bytes at a time on 32 bit machine, why should a double type be aligned on 8 byte boundary?

It is important to note that most of the processors will have math co-processor, called Floating Point Unit (FPU). Any floating point operation in the code will be translated into FPU instructions. The main processor is nothing to do with floating point execution. All this will be done behind the scenes.

As per standard, double type will occupy 8 bytes. And, every floating point operation performed in FPU will be of 64 bit length. Even float types will be promoted to 64 bit prior to execution.

The 64 bit length of FPU registers forces double type to be allocated on 8 byte boundary. I am assuming (I don’t have concrete information) in case of FPU operations, data fetch might be different, I mean the data bus, since it goes to FPU. Hence, the address decoding will be different for double types (which is expected to be on 8 byte boundary). It means, the address decoding circuits of floating point unit will not have last 3 pins."

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2021-01-20 04:03

Edited:

The advantage of byte alignment is to reduce the number of memory cycles to retrieve the data. For example, an 8 byte which might take a single cycle if it is aligned might now take 2 cycles since a part of it is obtained the first time and the second part in the next memory cycle.

I came across this: "Aligned access is faster because the external bus to memory is not a single byte wide - it is typically 4 or 8 bytes wide (or even wider). So the CPU doesn't fetch a single byte at a time - it fetches 4 or 8 bytes starting at the requested address. Therefore, the 2 or 3 least significant bits of the memory address are not actually sent by the CPU - the external memory can only be read or written at addresses that are a multiple of the bus width. If you requested a byte at address "9", the CPU would actually ask the memory for the block of bytes beginning at address 8, and load the second one into your register (discarding the others).

This implies that a misaligned access can require two reads from memory: If you ask for 8 bytes beginning at address 9, the CPU must fetch the 8 bytes beginning at address 8 as well as the 8 bytes beginning at address 16, then mask out the bytes you wanted. On the other hand, if you ask for the 8 bytes beginning at address 8, then only a single fetch is needed. Some CPUs will not even perform such a misaligned load - they will simply raise an exception (or even silently load the wrong data!)."

You might see this link for more details. http://www.ibm.com/developerworks/library/pa-dalign/

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-01-20 04:09
- I seem to remember that the recommendation for 486 was to align double on 32 bits boundaries, so requiring 64 bits alignment is not mandatory.
- You seem to think that there is a relationship between the data bus width and the processor bitness. While it is often the case, you can find variation in both direction. For instance the Pentium was a 32-bit processor, but its data bus size was 64 bits.
- Caches offer something else which may explain the usefulness of having 64-bit alignment for 64-bit types. Here the external bus is not a factor, it is the cache line size which is important. Data crossing the line cache is costlier to access than data not crossing it (even if it is unaligned in both cases). Aligning types on their size makes it sure that they won't cross cache lines as long as cache line size is a multiple of the type size.
0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-01-20 04:17
There are multiple hardware components that may be adversely affected by unaligned loads or stores.
- The interface to memory might be eight bytes wide and only able to access memory at multiples of eight bytes. Loading an unaligned eight-byte double then requires two reads on the bus. Stores are worse, because an aligned eight-byte store can simply write eight bytes to memory, but an unaligned eight-byte store must read two eight-byte pieces, merge the new data with the old data, and write two eight-byte pieces.
- Cache lines are typically 32 or 64 bytes. If eight-byte objects are aligned to multiples of eight bytes, then each object is in just one cache line. If they are unaligned, then some of the objects are partly in one cache line and partly in another. Loading or storing these objects then requires using two cache lines instead of one. This effect occurs at all levels of cache (three levels is not uncommon in modern processors).
- Memory system pages are typically 512 bytes or more. Again, each aligned object is in just one page, but some unaligned objects are in multiple pages. Each page that is accessed requires hardware resources: The virtual address must be translated to a physical address, this may require accessing translation tables, and address collisions must be detected. (Processors may have multiple load and store operations in operation simultaneously. Even though your program may appear to be single-threaded, the processor reads instructions in advance and tries to execute those that it can. So a processor may start a load instruction before preceding instructions have completed. However, to be sure this does not cause an error, the processor checks each load instruction to be sure it is not loading from an address that a prior store instruction is changing. If an access crosses a page boundary, the two parts of the loaded data have to be checked separately.)
The response of the system to unaligned operations varies from system to system. Some systems are designed to support only aligned accesses. In these cases, unaligned accesses either cause exceptions that lead to program termination or exceptions that cause execution of special handlers that emulate unaligned operations in software (by performing aligned operations and merging the data as necessary). Software handlers such as these are much slower than hardware operations.

Some systems support unaligned accesses, but this usually consumes more hardware resources than aligned accesses. In the best case, the hardware performs two operations instead of one. But some hardware is designed to start operations as if they were aligned and then, upon discovering the operation is not aligned, to abort it and start over using different paths in the hardware to handle the unaligned operation. In such systems, unaligned accesses have a significant performance penalty, although it is not as great as in systems where software handles unaligned accesses.

In some systems, the hardware may have multiple load-store execution units that can perform the two operations required of unaligned accesses just as quickly as one unit can perform the operation of aligned accesses. So there is no direct performance degradation of unaligned accesses. However, because multiple execution units are kept busy by unaligned accesses, they are unavailable to perform other operations. Thus, programs that perform many load-store operations, normally in parallel, will execute more slowly with unaligned accesses than with aligned accesses.
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2021-01-20 04:20

On many architectures, unaligned access of any load/store unit (short, int, long) is simply an exception. Compilers are responsible for ensuring it doesn't happen on potentially mis-aligned data, by emitting smaller access instructions and re-assembling in registers if they can't prove a given pointer is OK.

Performance-wise, 8-byte alignment of doubles on 32-bit systems can be valuable for a few reasons. The most apparent is that 4-byte alignment of an 8-byte double means that one element could cross the boundary of two cache lines. Memory access occurs in units of whole cache lines, and so misalignment doubles the cost of access.

0 讨论(0)
发布评论:

提交评论
- 加载中...