问题
I'm working in a driver that uses a buffer backed by hugepages, and I'm finding some problems with the sequentality of the hugepages.
In userspace, the program allocates a big buffer backed by hugepages using the mmap
syscall. The buffer is then communicated to the driver through a ioctl
call. The driver uses the get_user_pages
function to get the memory address of that buffer.
This works perfectly with a buffer size of 1 GB (1 hugepage). get_user_pages
returns a lot of pages (HUGE_PAGE_SIZE / PAGE_SIZE
) but they're all contigous, so there's no problem. I just grab the address of the first page with page_address
and work with that. The driver can also map that buffer back to userspace with remap_pfn_range
when another program does a mmap
call on the char device.
However, things get complicated when the buffer is backed by more than one hugepage. It seems that the kernel can return a buffer backed by non-sequential hugepages. I.e, if the hugepage pool's layout is something like this
+------+------+------+------+
| HP 1 | HP 2 | HP 3 | HP 4 |
+------+------+------+------+
, a request for a hugepage-backed buffer could be fulfilled by reserving HP1 and HP4, or maybe HP3 and then HP2. That means that when I get the pages with get_user_pages
in the last case, the address of page 0 is actually 1 GB after the address of page 262.144 (the next hugepage's head).
Is there any way to sequentalize access to those pages? I tried reordering the addresses to find the lower one so I can use the whole buffer (e.g., if kernel gives me a buffer backed by HP3, HP2 I use as base address the one of HP2), but it seems that would scramble the data in userspace (offset 0 in that reordered buffer is maybe offset 1GB in the userspace buffer).
TL;DR: Given >1 unordered hugepages, is there any way to access them sequentially in a Linux kernel driver?
By the way, I'm working on a Linux machine with 3.8.0-29-generic kernel.
回答1:
Using the function suggested by CL, vm_map_ram, I was able to remap the memory so it can be accesed sequentially, independently of the number of hugepages mapped. I leave the code here (error control not included) in case it helps anyone.
struct page** pages;
int retval;
unsigned long npages;
unsigned long buffer_start = (unsigned long) huge->addr; // Address from user-space map.
void* remapped;
npages = 1 + ((bufsize- 1) / PAGE_SIZE);
pages = vmalloc(npages * sizeof(struct page *));
down_read(¤t->mm->mmap_sem);
retval = get_user_pages(current, current->mm, buffer_start, npages,
1 /* Write enable */, 0 /* Force */, pages, NULL);
up_read(¤t->mm->mmap_sem);
nid = page_to_nid(pages[0]); // Remap on the same NUMA node.
remapped = vm_map_ram(pages, npages, nid, PAGE_KERNEL);
// Do work on remapped.
来源:https://stackoverflow.com/questions/25000494/sequential-access-to-hugepages-in-kernel-driver