How to release hugepages from the crashed application

问题

I have an application that uses hugepage and the application suddenly crashed due to some bug. After crashing, since the application does not release the hugepage properly, the free hugepage number is not increased in sys filesystem.

$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages 
0
$ sudo cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 
1024

Is there a way to release the hugepages by force?

回答1:

HugeTLB can either be used for shared memory (and Mark J. Bobak's answer would deal with that) or the app mmaps files created in a hugetlb filesystem. If the app crashes without removing those files they survive and keep corresponding memory 'allocated'.

Check hugeTLB filesystem and see if there are any leftover files from the app. Removing them would release the memory.

回答2:

Sometimes need to check all directory that hugetlbfs has been mounted. So,

find mounted directory by command mount | grep huge.
check every directory except especially /dev/hugepages.
delete all 2M-sized files. (2M is the size of hugepage)

回答3:

Use ipcs -m to list the shared memory segments. Use ipcrm to remove the left over shared memory segments.

Edit on 06/24/2019: Ok, so, the above answer, while correct as far as it goes, was a bit brief. In particular, if you have a host with multiple DB instances, and only one is crashed how can you determine which (if any) memory segments should be cleaned up?

Well, this too, can be done. For each running instance, connect w/ / as sysdba, then do oradebug setmypid (any pid will do, as all Oracle PIDs connect to the SGA). Then do oradebug ipc. That will (hopefully) return IPC information written to the trace file. So, go to the udump (or diag_dest) directory, and look for your trace file. It will contain all the IPC information for the instance. This will include ShmId. Look through the file for the ShmId(s) that this instance is using. Now look at the output of ipcs -m.

When you have done that for all the running instances, any memory segment output by ipcs -m that shows non-zero memory allocation, and that you cannot account for in the oradebug ipc information from any running instance, must be the left over memory segments from the crashed instance. Use ipcrm to remove it/them.

When doing this on a host with multiple running instances, this can be a bit fraught. Please proceed with caution. You don't want to remove the SGA of a running instance!

Hope that helps....

回答4:

If you follow the instruction below, you can get rid of the allocated hugepages:

1) Let's check the hugepages which were free at restart

dpdk@dpdkvm:~$ ls /mnt/huge/
empty

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:      256
...

2) Starting a dpdk application with wrong parameters, producing an error

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo ./build/kni -c 0x03 -n 2 -- -P -p 0x03 --config="(0,0,1),(1,0,1)"
...
EAL: Error - exiting with code: 1
  Cause: No supported Ethernet device found

3) When I check hugepages, there is not any free

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:        0
...

4) Now, when I check the mounted hugepage directory, I can see the files which are not given back to OS by dpdk application.

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ ls /mnt/huge/
...
rtemap_0    rtemap_137  rtemap_176  rtemap_214  rtemap_253  rtemap_62
...

5) Finally, if you remove the files starting with rtemap, you can give the hugepages back

dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ sudo rm /mnt/huge/*
[sudo] password for dpdk:
dpdk@dpdkvm:~/dpdk-1.8.0/examples/kni$ cat /proc/meminfo
...
HugePages_Total:     256
HugePages_Free:      256
...

回答5:

your hugetlb may be used by shared memory or mmap files. try to remove the shared memories or umount the hugetlb fs

来源：https://stackoverflow.com/questions/20366181/how-to-release-hugepages-from-the-crashed-application

标签

Linux

memory-management

huge-pages