Why is dynamically allocated memory always 16 bytes aligned?

问题

I wrote a simple example:

#include <iostream>

int main() {
    void* byte1 = ::operator new(1);
    void* byte2 = ::operator new(1);
    void* byte3 = malloc(1);
    std::cout << "byte1: " << byte1 << std::endl;
    std::cout << "byte2: " << byte2 << std::endl;
    std::cout << "byte3: " << byte3 << std::endl;
    return 0;
}

Running the example, I get the following results:

byte1: 0x1f53e70

byte2: 0x1f53e90

byte3: 0x1f53eb0

Each time I allocate a single byte of memory, it's always 16 bytes aligned. Why does this happen?

I tested this code on GCC 5.4.0 as well as GCC 7.4.0, and got the same results.

回答1:

Why does this happen?

Because the standard says so. More specifically, it says that the dynamic allocations¹ are aligned to at least the maximum fundamental² alignment (it may have stricter alignment). There is a pre-defined macro (since C++17) just for the purpose of telling you exactly what this guaranteed alignment is: __STDCPP_DEFAULT_NEW_ALIGNMENT__. Why this might be 16 in your example... that is a choice of the language implementation, restricted by what is allowed by the target hardware architecture.

This is (was) a necessary design, considering that there is (was) no way to pass information about the needed alignment to the allocation function (until C++17 which introduced aligned-new syntax for the purpose of allocating "over-aligned" memory).

malloc doesn't know anything about the types of objects that you intend to create into the memory. One might think that new could in theory deduce the alignment since it is given a type... but what if you wanted to reuse that memory for other objects with stricter alignment, like for example in implementation of std::vector? And once you know the API of the operator new: void* operator new ( std::size_t count ), you can see that the type or its alignment are not an argument that could affect the alignment of the allocation.

¹ Made by the default allocator, or malloc family of functions.

² The maximum fundamental alignment is alignof(std::max_align_t). No fundamental type (arithmetic types, pointers) has stricter alignment than this.

回答2:

It's probably the way the memory allocator manages to get the necessary information to the deallocation function: the issue of the deallocation function (like free or the general, global operator delete) is that there is exactly one argument, the pointer to the allocated memory and no indication of the size of the block that was requested (or the size that was allocated if it's larger), so that indication (and much more) needs to be provided in some other form to the deallocation function.

The most simple yet efficient approach is to allocate room for that additional information plus the requested bytes, and return a pointer to the end of the information block, let's call it IB. The size and alignment of IB automatically aligns the address returned by either malloc or operator new, even if you allocate a minuscule amount: the real amount allocated by malloc(s) is sizeof(IB)+s.

For such small allocations the approach is relatively wasteful and other strategies might be used, but having multiple allocation methods complicate deallocation as the function must first determine which method was used.

回答3:

There are actually two reasons. The first reason is, that there are some alignment requirements for some kinds of objects. Usually, these alignment requirements are soft: A misaligned access is "just" slower (possibly by orders of magnitude). They can also be hard: On the PPC, for instance, you simply could not access a vector in memory if that vector was not aligned to 16 bytes. Alignment is not something optional, it is something that must be considered when allocating memory. Always.

Note that there is no way to specify an alignment to malloc(). There's simply no argument for it. As such, malloc() must be implemented to provide a pointer that is correctly aligned for any purposes on the platform. The ::operator new() in C++ follows the same principle.

How much alignment is needed is fully platform dependent. On a PPC, there is no way that you can get away with less than 16 bytes alignment. X86 is a bit more lenient in this, afaik.

The second reason is the inner workings of an allocator function. Typical implementations have an allocator overhead of at least 2 pointers: Whenever you request a byte from malloc() it will usually need to allocate space for at least two additional pointers to do its own bookkeeping (the exact amount depends on the implementation). On a 64 bit architecture, that's 16 bytes. As such, it is not sensible for malloc() to think in terms of bytes, it's more efficient to think in terms of 16 byte blocks. At least. You see that with your example code: The resulting pointers are actually 32 bytes apart. Each memory block occupies 16 bytes payload + 16 bytes internal bookkeeping memory.

Since the allocators request entire memory pages from the kernel (4096 bytes, 4096 bytes aligned!), the resulting memory blocks are naturally 16 bytes aligned on a 64 bit platform. It's simply not practical to provide less aligned memory allocations.

So, taken these two reasons together, it is both practical and required to provide seriously aligned memory blocks from an allocator function. The exact amount of alignment depends on the platform, but will usually not be less than the size of two pointers.

回答4:

Why does this happens?

Because in general case library does not know what kind of data you are going to store in that memory so it has to be aligned to the biggest data type on that platform. And if you store data unaligned you will get significant penalty of hardware performance. On some platforms you will even get segfault if you try to access data unaligned.

回答5:

Due to the platform. On X86 it isn't necessary but gains performance of the operations. As I know on newer models it doesn't make a difference but compiler goes for the optimum. When not aligned properly for example a long not aligned 4 byte on a m68k processor will crash.

回答6:

It isn't. It depends on the OS/CPU requirements. In the case of 32bit version of linux/win32, the allocated memory is always 8 byte aligned. In the case of 64bit versions of linux/win32, since all 64bit CPUs have SSE2 at a minimum, it kinda made sense at the time to align all memory to 16bytes (because working with SSE2 was less efficient when using unaligned memory). With the latest AVX based CPUs, this performance penalty for unaligned memory has been removed, so really they could allocate on any boundary.

If you think of it, aligning the addresses for memory allocations to 16bytes gives you 4bits of blank space in the pointer address. This may be useful internally for storing some additional flags (e.g. readable, writable, executable, etc).

At the end of the day, the reasoning is entirely dictated by the OS and/or hardware requirements. It's nothing to do with the language.

来源：https://stackoverflow.com/questions/59098246/why-is-dynamically-allocated-memory-always-16-bytes-aligned

标签

c++

dynamic-memory-allocation

memory-alignment