Why Linux Kernel ZONE_NORMAL is limited to 896 MB?

前端 未结 2 619
小蘑菇
小蘑菇 2021-02-01 09:24

A newbie question. I\'m doing some kernel study and get confused on the 896MB size limit of ZONE_NORMAL. I don\'t understand why kernel cannot map 4G physical memory into kernel

2条回答
  •  北海茫月
    2021-02-01 09:44

    The reason why the kernel limits itself to 896 megabytes is for performance reasons.

    The more space available to the kernel means less address space available to userspace. This 3/1 split means that the most amount of address space a user process can allocate is 3 gigabytes -- of course, due to memory fragmentation, in practice it seems to start failing around 2.5 gigabytes.

    Different splits are available: 2/2 and 1/3 splits that allocate two gigabyte address space for the kernel and two gigabytes for userspace, and three gigabytes for the kernel and one gigabyte address space for userspace. (This firefox is now consuming 1249 megabytes, so it couldn't fit into one of those 1/3 split kernels.)

    There are some kernels (perhaps vendor-only?) that support what is known as the 4:4 split -- four gigabytes of address space for the kernel and four gigabytes of address space for userspace. These are extremely useful for the 32-bit systems that have 32 or 64 gigabytes of memory -- since a large system probably has many disks, a lot of IO in flight, and needs significant buffering for both block devices and network traffic. However, these 4:4 kernels require flushing the TLB cache on entering and exiting every system call. These TLB flushes introduce significant slowdowns on "small" systems and are only worth it on "large" systems where the extra memory can cache enough disk / network resources to improve the performance of the system.

    The other splits don't incur this TLB flush because the TLB maintains a permissions bit indicating whether the pages are available when the CPU is in user state or supervisor state: the kernel pages are always mapped, but marked available only when the CPU's supervisor flag is set. So entering and exiting the kernel is fast, when exiting back to the process that entered the kernel. When context-switching, of course the TLB needs to be flushed then.

提交回复
热议问题