numa

How to get the size of memory pointed by a pointer?

耗尽温柔 提交于 2019-12-06 16:12:19
I am currently working on a NUMA machine. I am using numa_free to free my allocated memory. However, unlike free , numa_free needs to know how many bytes are to be freed. Is there any way to know that how many bytes are pointed to by a pointer without tracing it out? There is no way to obtain memory size using underlying API. You must remember size during the allocation somewhere. For Example, You may write your own allocator, that allocates 4 extra bytes, stores in first 4 bytes size of buffer, and during deallocation you can read size of buffer from it: void *my_alloc(size_t size) { void

How is NUMA represented in virtual memory?

北战南征 提交于 2019-12-06 09:43:07
There are many resources describing the architecture of NUMA from a hardware perspective and the performance implications of writing software that is NUMA-aware, but I have not yet found information regarding the how the mapping between virtual pages and physical frames is decided with respect to NUMA. More specifically, the application running on modern Linux still sees a single contiguous virtual address space. How can the application tell which parts of the address space are mapped onto local memory and which are mapped onto the memory of another NUMA node? If the answer is that the

How to force two process to run on the same CPU?

一个人想着一个人 提交于 2019-12-06 07:21:09
问题 Context: I'm programming a software system that consists of multiple processes. It is programmed in C++ under Linux. and they communicate among them using Linux shared memory. Usually, in software development, is in the final stage when the performance optimization is made. Here I came to a big problem. The software has high performance requirements, but in machines with 4 or 8 CPU cores (usually with more than one CPU), it was only able to use 3 cores, thus wasting 25% of the CPU power in

Which architecture to call Non-uniform memory access (NUMA)?

試著忘記壹切 提交于 2019-12-05 16:21:33
According to wiki : Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. But it is not clear whether it is about any memory including caches or about main memory only. For example Xeon Phi processor have next architecture: Memory access to main memory (GDDR) is same for all cores. Meanwhile memory access to L2 cache is different for different cores, since first native L2 cache is checked, then L2 cache of other cores is checked via ring. Is it NUMA or UMA architecture?

NUMA-aware named shared memory for Linux

天大地大妈咪最大 提交于 2019-12-05 09:57:21
The Windows API offers the CreateFileMappingNuma function ( http://msdn.microsoft.com/en-us/library/windows/desktop/aa366539(v=vs.85).aspx ) to create a named shared memory space on a specific NUMA node. So far, I have not found an equivalent function for Linux. My current approach looks like this: Allocate named shared memory (using shm_open(...)) Determine current NUMA node (using numa_move_pages(...)) Move pages to target Node (using numa_move_pages(...) again) Does anyone know a better approach? EDIT: For the record: My proposed implementation does work as expected! That sounds right. Note

many-core CPU's: Programming techniques to avoid disappointing scalability

醉酒当歌 提交于 2019-12-05 05:35:51
We've just bought a 32-core Opteron machine, and the speedups we get are a little disappointing: beyond about 24 threads we see no speedup at all (actually gets slower overall) and after about 6 threads it becomes significantly sub-linear. Our application is very thread-friendly: our job breaks down into about 170,000 little tasks which can each be executed separately, each taking 5-10 seconds. They all read from the same memory-mapped file of size about 4Gb. They make occasional writes to it, but it might be 10,000 reads to each write - we just write a little bit of data at the end of each of

How to force two process to run on the same CPU?

两盒软妹~` 提交于 2019-12-04 11:17:42
Context: I'm programming a software system that consists of multiple processes. It is programmed in C++ under Linux. and they communicate among them using Linux shared memory. Usually, in software development, is in the final stage when the performance optimization is made. Here I came to a big problem. The software has high performance requirements, but in machines with 4 or 8 CPU cores (usually with more than one CPU), it was only able to use 3 cores, thus wasting 25% of the CPU power in the first ones, and more than 60% in the second ones. After many research, and having discarded mutex and

How to confirm NUMA?

☆樱花仙子☆ 提交于 2019-12-03 14:08:46
问题 How can I confirm that a host is NUMA-aware? The Oracle doc says that NUMA-awareness starts at kernel 2.6.19, but the NUMA man page says that it was introduced with 2.6.14. I'd like to be sure that a Java process started with -XX:+UseNUMA is actually taking advantage of something. Checking for the numa_maps, I see that I have them: # find /proc -name numa_maps /proc/1/task/1/numa_maps /proc/1/numa_maps /proc/2/task/2/numa_maps /proc/2/numa_maps /proc/3/task/3/numa_maps Though my kernel is

Does gcc, icc, or Microsoft's C/C++ compiler support or know anything about NUMA?

牧云@^-^@ 提交于 2019-12-03 07:28:27
问题 If I have a multi-processor board that has cache-coherent non-uniform memory access ( NUMA ), i.e. separate "northbridges" with separate RAM for each processor, does any compiler know how to automatically spread the data across the different memory systems such that processes working on local threads are mostly retrieving their data from the RAM associated with the processor the thread is running on? I have a setup where 1 GB is attached to processor 0, 1 GB is attached to processor 1, et c.

OpenMP and NUMA relation?

百般思念 提交于 2019-12-03 05:19:20
问题 I have a dual socket Xeon E5522 2.26GHZ machine (with hyperthreading disabled) running ubuntu server on linux kernel 3.0 supporting NUMA. The architecture layout is 4 physical cores per socket. An OpenMP application runs in this machine and i have the following questions: Does an OpenMP program take advantage (i.e a thread and its private data are kept on a numa node along the execution) automatically when running on a NUMA machine + aware kernel?. If not, what can be done? what about NUMA