numa

Does gcc, icc, or Microsoft's C/C++ compiler support or know anything about NUMA?

二次信任 提交于 2019-12-02 21:02:17
If I have a multi-processor board that has cache-coherent non-uniform memory access ( NUMA ) , i.e. separate "northbridges" with separate RAM for each processor, does any compiler know how to automatically spread the data across the different memory systems such that processes working on local threads are mostly retrieving their data from the RAM associated with the processor the thread is running on? I have a setup where 1 GB is attached to processor 0, 1 GB is attached to processor 1, et c. up to 4 processors. In the coherent memory space the physical memory for the RAM on the 1st processor

OpenMP and NUMA relation?

孤街浪徒 提交于 2019-12-02 18:35:37
I have a dual socket Xeon E5522 2.26GHZ machine (with hyperthreading disabled) running ubuntu server on linux kernel 3.0 supporting NUMA. The architecture layout is 4 physical cores per socket. An OpenMP application runs in this machine and i have the following questions: Does an OpenMP program take advantage (i.e a thread and its private data are kept on a numa node along the execution) automatically when running on a NUMA machine + aware kernel?. If not, what can be done? what about NUMA and per thread private C++ STL data structures ? The current OpenMP standard defines a boolean

Poor memcpy Performance on Linux

只谈情不闲聊 提交于 2019-12-02 15:42:29
We have recently purchased some new servers and are experiencing poor memcpy performance. The memcpy performance is 3x slower on the servers compared to our laptops. Server Specs Chassis and Mobo: SUPER MICRO 1027GR-TRF CPU: 2x Intel Xeon E5-2680 @ 2.70 Ghz Memory: 8x 16GB DDR3 1600MHz Edit: I am also testing on another server with slightly higher specs and seeing the same results as the above server Server 2 Specs Chassis and Mobo: SUPER MICRO 10227GR-TRFT CPU: 2x Intel Xeon E5-2650 v2 @ 2.6 Ghz Memory: 8x 16GB DDR3 1866MHz Laptop Specs Chassis: Lenovo W530 CPU: 1x Intel Core i7 i7-3720QM @ 2

Is mov + mfence safe on NUMA?

梦想的初衷 提交于 2019-12-01 06:15:01
I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3.join(); t4.join(); return r1*2 + r2; } in which according to https://godbolt.org/z/APS4ZY go1 and go2 are

Is mov + mfence safe on NUMA?

老子叫甜甜 提交于 2019-12-01 05:32:31
问题 I see that g++ generates a simple mov for x.load() and mov + mfence for x.store(y) . Consider this classic example: #include<atomic> #include<thread> std::atomic<bool> x,y; bool r1; bool r2; void go1(){ x.store(true); } void go2(){ y.store(true); } bool go3(){ bool a=x.load(); bool b=y.load(); r1 = a && !b; } bool go4(){ bool b=y.load(); bool a=x.load(); r2= b && !a; } int main() { std::thread t1(go1); std::thread t2(go2); std::thread t3(go3); std::thread t4(go4); t1.join(); t2.join(); t3

NUMA Get Current Node/Core

眉间皱痕 提交于 2019-11-30 15:51:47
I'm using libnuma on Linux. My threads should be aware of the node/core they're running on. Is it possible to get the current threads's node/core somehow? I've been through the documentation, but I didn't find such a function... I found this solution: #include <stdio.h> #include <utmpx.h> int main(void) { printf("CPU: %d\n", sched_getcpu()); return 0; } Then, if you need the node of the cpu, you can use numa.h: int cpu = sched_getcpu(); int node = numa_node_of_cpu(cpu); A lighter-weight approach is to make use of the RDTSCP instruction (on x86 systems that support it -- it will be listed as

numactl --physcpubind

筅森魡賤 提交于 2019-11-30 07:35:22
I was using the numactl, with --physcpubind option. manual says: --physcpubind=cpus, -C cpus Only execute process on cpus. Etc... Let's say I have NUMA system with 3 NUMA nodes, where each of them has 4 cores. NUMA node 0 has 0, 1, 2, 3 as core numbers. NUMA node 1 has 4,5,6,7, and NUMA node 2 has 8,9,10,11. My question is let's say I run the program as follows: export OMP_NUM_THREADS=6 numactl --physcpubind=0,1,4,5,8,9 ./program i.e. I'll be running my program with 6 threads and I am requesting them to be on CPU cores 0,1,4,5,8,9. For example, if at some point during the program threads 0-5

NUMA Get Current Node/Core

心已入冬 提交于 2019-11-29 22:52:56
问题 I'm using libnuma on Linux. My threads should be aware of the node/core they're running on. Is it possible to get the current threads's node/core somehow? I've been through the documentation, but I didn't find such a function... 回答1: I found this solution: #include <stdio.h> #include <utmpx.h> int main(void) { printf("CPU: %d\n", sched_getcpu()); return 0; } Then, if you need the node of the cpu, you can use numa.h: int cpu = sched_getcpu(); int node = numa_node_of_cpu(cpu); 回答2: A lighter

Measuring NUMA (Non-Uniform Memory Access). No observable asymmetry. Why?

早过忘川 提交于 2019-11-29 20:08:08
I've tried to measure the asymmetric memory access effects of NUMA, and failed. The Experiment Performed on an Intel Xeon X5570 @ 2.93GHz, 2 CPUs, 8 cores. On a thread pinned to core 0, I allocate an array x of size 10,000,000 bytes on core 0's NUMA node with numa_alloc_local. Then I iterate over array x 50 times and read and write each byte in the array. Measure the elapsed time to do the 50 iterations. Then, on each of the other cores in my server, I pin a new thread and again measure the elapsed time to do 50 iterations of reading and writing to every byte in array x . Array x is large to

numactl --physcpubind

天涯浪子 提交于 2019-11-29 10:28:31
问题 I was using the numactl, with --physcpubind option. manual says: --physcpubind=cpus, -C cpus Only execute process on cpus. Etc... Let's say I have NUMA system with 3 NUMA nodes, where each of them has 4 cores. NUMA node 0 has 0, 1, 2, 3 as core numbers. NUMA node 1 has 4,5,6,7, and NUMA node 2 has 8,9,10,11. My question is let's say I run the program as follows: export OMP_NUM_THREADS=6 numactl --physcpubind=0,1,4,5,8,9 ./program i.e. I'll be running my program with 6 threads and I am