Sequential access to hugepages in kernel driver

问题

I'm working in a driver that uses a buffer backed by hugepages, and I'm finding some problems with the sequentality of the hugepages.

In userspace, the program allocates a big buffer backed by hugepages using the mmap syscall. The buffer is then communicated to the driver through a ioctl call. The driver uses the get_user_pages function to get the memory address of that buffer.

This works perfectly with a buffer size of 1 GB (1 hugepage). get_user_pages returns a lot of pages (HUGE_PAGE_SIZE / PAGE_SIZE) but they're all contigous, so there's no problem. I just grab the address of the first page with page_address and work with that. The driver can also map that buffer back to userspace with remap_pfn_range when another program does a mmap call on the char device.

However, things get complicated when the buffer is backed by more than one hugepage. It seems that the kernel can return a buffer backed by non-sequential hugepages. I.e, if the hugepage pool's layout is something like this

+------+------+------+------+
| HP 1 | HP 2 | HP 3 | HP 4 |
+------+------+------+------+

, a request for a hugepage-backed buffer could be fulfilled by reserving HP1 and HP4, or maybe HP3 and then HP2. That means that when I get the pages with get_user_pages in the last case, the address of page 0 is actually 1 GB after the address of page 262.144 (the next hugepage's head).

Is there any way to sequentalize access to those pages? I tried reordering the addresses to find the lower one so I can use the whole buffer (e.g., if kernel gives me a buffer backed by HP3, HP2 I use as base address the one of HP2), but it seems that would scramble the data in userspace (offset 0 in that reordered buffer is maybe offset 1GB in the userspace buffer).

TL;DR: Given >1 unordered hugepages, is there any way to access them sequentially in a Linux kernel driver?

By the way, I'm working on a Linux machine with 3.8.0-29-generic kernel.

回答1:

Using the function suggested by CL, vm_map_ram, I was able to remap the memory so it can be accesed sequentially, independently of the number of hugepages mapped. I leave the code here (error control not included) in case it helps anyone.

struct page** pages;
int retval;
unsigned long npages;
unsigned long buffer_start = (unsigned long) huge->addr; // Address from user-space map.
void* remapped;

npages =  1 + ((bufsize- 1) / PAGE_SIZE); 

pages = vmalloc(npages * sizeof(struct page *));

down_read(&current->mm->mmap_sem);
retval = get_user_pages(current, current->mm, buffer_start, npages,
                     1 /* Write enable */, 0 /* Force */, pages, NULL);
up_read(&current->mm->mmap_sem);    

nid = page_to_nid(pages[0]); // Remap on the same NUMA node.

remapped = vm_map_ram(pages, npages, nid, PAGE_KERNEL);

// Do work on remapped.

来源：https://stackoverflow.com/questions/25000494/sequential-access-to-hugepages-in-kernel-driver

标签

memory

linux-kernel

huge-pages