numa, mbind, segfault | 易学教程

问题

I have allocated memory using valloc, let's say array A of [15*sizeof(double)]. Now I divided it into three pieces and I want to bind each piece (of length 5) into three NUMA nodes (let's say 0,1, and 2). Currently, I am doing the following:

double* A=(double*)valloc(15*sizeof(double));

piece=5; 
nodemask=1;
mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

nodemask=2;
mbind(&A[5],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

nodemask=4;
mbind(&A[10],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

First question is am I doing it right? I.e. is there any problems with being properly aligned to page size for example? Currently with size of 15 for array A it runs fine, but if I reset the array size to something like 6156000 and piece=2052000, and subsequently three calls to mbind start with &A[0], &A[2052000], and &A[4104000] then I am getting a segmentation fault (and sometimes it just hangs there). Why it runs for small size fine but for larger gives me segfault? Thanks.

回答1:

For this to work, you need to deal with chunks of memory that are at least page-size and page-aligned - that means 4KB in most systems. In your case, I suspect the page gets moved twice (possibly three times), due to you calling mbind() thre times over.

The way numa memory is located is that CPU socket 0 has a range of 0..X-1 MB, socket 1 has X..2X-1, socket three has 2X-3X-1, etc. Of course, if you stick a 4GB stick of ram next to socket 0 and a 16GB in the socket 1, then the distruction isn't even. But the principle still stands that a large chunk of memory is allocated for each socket, in accordance to where the memory is actually located.

As a consequence of how the memory is located, the pysical location of the memory you are using will have to be placed in the linear (virtual) address space by page-mapping.

So, for large "chunks" of memory, it is fine to move it around, but for small chunks, it won't work quite right - you certainly can't "split" a page into something that is affine to two different CPU sockets.

Edit:

To split an array, you first need to find the page-aligned size.

page_size = sysconf(_SC_PAGESIZE);

objs_per_page = page_size / sizeof(A[0]); 
// We should be an even number of "objects" per page. This checks that that 
// no object straddles a page-boundary
ASSERT(page_size % sizeof(A[0]));   

split_three = SIZE / 3; 

aligned_size = (split_three / objs_per_page) * objs_per_page;

remnant = SIZE - (aligned_size * 3);

piece = aligned_size;

mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size*2 + remnant],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

Obviously, you will now need to split the three threads similarly using the aligned size and remnant as needed.

来源：https://stackoverflow.com/questions/14528588/numa-mbind-segfault

标签

Linux

numa