There are a few options for acquiring an aligned block of memory but they\'re very similar and the issue mostly boils down to what language standard and platforms you\'re ta
It's possible to take an existing C compiler which does not presently happen to use the identifiers _mm_alloc
and _mm_free
and define functions with those names which will behave as required. This could be done either by having _mm_alloc
function as a wrapper on malloc()
which asks for a slightly-oversized allocation and constructs a pointer to the first suitably-aligned address within it that's at least one byte from the beginning, and storing the number of bytes skipped immediately before that address, or by having _mm_malloc
request large chunks of memory from malloc()
and then dispense them piecemeal. In any case, the pointers returned by _mm_malloc()
would not be pointers that free()
would generally know how to do anything with; calling _mm_free
would use the byte immediately preceding the allocation as an aid to finding the real start of the allocation received from malloc
, and then pass that do free
.
If an aligned-allocate function is allowed to use the internals of the malloc
and free
functions, however, that may eliminate the need for the extra layer of wrapping. It's possible to write _mm_alloc()
/_mm_free()
functions which wraps malloc
/free
without knowing anything about their internals, but it requires that _mm_alloc()
keep book-keeping information which is separate from that used by malloc
/free
.
If the author of an aligned-allocate function knows how malloc
and free
are implemented, it will often be possible to coordinate the design of all the allocation/free functions so that free
can distinguish all kinds of allocations and handle them appropriately. No single aligned-allocate implementation would be usable on all malloc
/free
implementations, however.
I would suggest that the most portable way to write code would probably be to select a couple of symbols that are not used anywhere else for your own allocate and free functions, so that you could then say, e.g.
#define a_alloc(align,sz) _mm_alloc((align),(sz))
#define a_free(ptr) _mm_free((ptr))
on compilers that support that, or
static inline void *aa_alloc(int align, int size)
{
void *ret=0;
posix_memalign(&ret, align, size); // Guessing here
return ret;
}
#define a_alloc(align,sz) aa_alloc((align),(sz))
#define a_free(ptr) free((ptr))
on Posix systems, etc. For every system it should be possible to define macros or functions that will yield the necessary behavior [I think it's probably better to use macros consistently than to sometimes use macros and sometimes functions, so as to allow #if defined macroname
to test whether things are defined yet].
_mm_malloc seems to have been created before there was a standard aligned_alloc function, and the need to use _mm_free is a quirk of the implementation.
My guess is that unlike when using posix_memalign, it doesn't need to over-allocate in order to guarantee alignment, instead it uses a separate alignment-aware allocator. This will save memory when allocating types with alignment different to the default alignment (typically 8 or 16 bytes).
Intel compilers support POSIX (Linux) and non-POSIX (Windows) operating systems, hence cannot rely upon either the POSIX or the Windows function. Thus, a compiler-specific but OS-agnostic solution was chosen.
C11 is a great solution but Microsoft doesn't even support C99 yet, so who knows if they will ever support C11.
Update: Unlike the C11/POSIX/Windows allocation functions, the ICC intrinsics include a deallocation function. This allows this API to use a separate heap manager from the default one. I don't know if/when it actually does that, but it can be useful to support this model.
Disclaimer: I work for Intel but have no special knowledge of these decisions, which happened long before I joined the company.