Does Linux malloc() behave differently on ARM vs x86?

后端 未结 1 1186
礼貌的吻别
礼貌的吻别 2021-02-04 21:37

There are a lot of questions about memory allocation on this site, but I couldn\'t find one that specifically addresses my concern. This question seems closest, and it led me to

相关标签:
1条回答
  • 2021-02-04 22:14

    A little background

    malloc() doesn't lie, your kernel Virtual Memory subsystem does, and this a common practice on most modern Operating Systems. When you use malloc(), what's really happening is something like this:

    1. The libc implementation of malloc() checks its internal state, and will try to optimize your request by using a variety of strategies (like trying to use a preallocated chunk, allocating more memory than requested in advance...). This means the implementation will impact on the performance and change a little the amount of memory requested from the kernel, but this is not really relevant when checking the "big numbers", like you're doing in your tests.

    2. If there's no space in a preallocated chunk of memory (remember, chunks of memory are usually pretty small, in the order of 128KB to 1MB), it will ask the kernel for more memory. The actual syscall varies from one kernel to another (mmap(), vm_allocate()...) but its purpose is mostly the same.

    3. The VM subsystem of the kernel will process the request, and if it finds it to be "acceptable" (more on this subject later), it will create a new entry in the memory map of the requesting task (I'm using UNIX terminology, where task is a process with all its state and threads), and return the starting value of said map entry to malloc().

    4. malloc() will take note of the newly allocated chunk of memory, and will return the appropriate answer to your program.

    OK, so now you're program has successfully malloc'ed some memory, but the truth is that not a single page (4KB in x86) of physical memory has been actually allocated to your request yet (well, this is an oversimplification, as collaterally some pages could have been used to store info about the state of the memory pool, but it makes it easier to illustrate the point).

    So, what happens when you try to access this recently malloc'ed memory? A segmentation fault. Surprisingly, this is a relatively little known fact, but your system is generating segmentation faults all the time. Your program is then interrupted, the kernel takes control, checks if the address faulting corresponds to a valid map entry, takes one or more physical pages and links them to the task's map.

    If your program tries to access an address which is not inside a map entry in your task, the kernel will not be able to resolve the fault, and will send the signal (or the equivalent mechanism for non-UNIX systems) to it pointing out this problem. If the program doesn't handle that signal by itself, it will be killed with the infamous Segmentation Fault error.

    So physical memory is not allocated when you call malloc(), but when you actually access that memory. This allows the OS to do some nifty tricks like disk paging, balloning and overcommiting.

    This way, when you ask how much memory a specific process is using, you need to look at two different numbers:

    • Virtual Size: The amount of memory that has been requested, even if it's not actually used.

    • Resident Size: The memory which it is really using, backed by physical pages.

    How much overcommit is enough?

    In computing, resource management in a complex issue. You have a wide range of strategies, from the most strict capability-based systems, to the much more relaxed behavior of kernels like Linux (with memory_overcommit == 0), which basically will allow you to request memory up to the maximum map size allowed for a task (which is a limit that depends on the architecture).

    In the middle, you have OSes like Solaris (mentioned in your article), which limit the amount of virtual memory for a task to the sum of (physical pages + swap disk pages). But don't be fooled by the article you referenced, this is not always a good idea. If you're running a Samba or Apache server with hundreds to thousands of independent processes running at the same time (which leads to a lot of virtual memory wasting due to fragmentation), you'll have to configure a ridiculous amount of swap disk, or your system will run out of virtual memory, while still having a lot of free RAM.

    But why does memory overcommit work differently on ARM?

    It doesn't. At least it shouldn't, but ARM vendors have an insane tendency to introduce arbitrary changes to the kernels they distribute with their systems.

    In your test case, the x86 machine is working as it is expected. As you're allocating memory in small chunks, and you have vm.overcommit_memory set to 0, you're allowed to fill all your virtual space, which is somewhere on the 3GB line, because you're running it on a 32 bit machine (if you try this on 64 bits, the loop will run until n==N). Obviously, when you try to use that memory, the kernel detects that physical memory is getting scarce, and activates the OOM killer countermeasure.

    On ARM it should be the same. As it doesn't, two possibilities come to my mind:

    1. overcommit_memory is on NEVER (2) policy, perhaps because someone has forced it this way on the kernel.

    2. You're reaching the maximum allowed map size for the task.

    As on each run on ARM, you get different values for the malloc phase, I would discard the second option. Make sure overcommit_memory is enabled (value 0) and rerun your test. If you have access to those kernel sources, take a look at them to make sure the kernel honors this sysctl (as I said, some ARM vendors like to do nasty things to their kernels).

    As a reference, I've ran demo3 under QEMU emulating vertilepb and on an Efika MX (iMX.515). The first one stopped malloc'ing at the 3 GB mark, as expected on a 32 bit machine, while the other did it earlier, at 2 GB. This may come as a surprise, but if you take a look at its kernel config (https://github.com/genesi/linux-legacy/blob/master/arch/arm/configs/mx51_efikamx_defconfig), you'll see this:

    CONFIG_VMSPLIT_2G=y
    # CONFIG_VMSPLIT_1G is not set
    CONFIG_PAGE_OFFSET=0x80000000
    

    The kernel is configured with a 2GB/2GB split, so the system is behaving as expected.

    0 讨论(0)
提交回复
热议问题