explanation to aligned malloc implementation

后端 未结 4 2020
旧巷少年郎
旧巷少年郎 2021-02-04 07:33

This is not homework, this is purely for my own personal education.

I couldn\'t figure out how to implement an aligned malloc so looked online and found this website. Fo

4条回答
  •  孤独总比滥情好
    2021-02-04 08:02

    I have a few issues with this code. I have compiled them into the below list:

    1. p1 = (void*)malloc You do not cast the return value of malloc.
    2. free(((void**)p)[-1]); You do not cast free.
    3. if ((p1 = (void*)malloc(required_bytes + offset)) == NULL) Do not put an assignment inside the comparison of an if statement. I know a lot of people do this, but in my mind, that is just bad form and makes the code more difficult to read.

    What they are doing here is storing the original pointer inside the allocated block. That means that only the aligned pointer gets returned to the user. The actual pointer that is returned by malloc, the user never sees. You have to keep that pointer though because free needs it to unlink the block from the allocated list and put it on the free list. At the head of every memory block, malloc puts some housekeeping information there. Things such and next/prev pointers, size, allocation status, etc.... Some debug versions of malloc use guard words to check if something overflowed the buffer. The alignment that is passed to the routine MUST be a power of 2.

    When I wrote my own version of malloc for use in a pooled memory allocator, the minimum block size that I used was 8 bytes. So including the header for a 32-bit system, the total was 28 bytes (20 bytes for the header). On a 64-bit system, it was 40 bytes (32 bytes for the header). Most systems have increased performance when data is aligned to some address value (either 4 or 8 bytes on modern computer systems). The reason for this is because the machine can grab the entire word in one bus cycle if it is aligned. If not, then it requires two bus cycles to get the entire word, then it has to construct it. This is why compilers align variables on either 4 or 8 bytes. This means that the last 2 or 3 bits of the address bus are zero.

    I know that there are some hardware constraints which requires more alignment than the default 4 or 8. Nvidia's CUDA system, if I remember correctly, requires things aligned to 256 bytes...and that's a hardware requirement.

    This has been asked before though. See: How to allocate aligned memory only using the standard library?

    Hope this helps.

提交回复
热议问题