Context:
I'm programming a software system that consists of multiple processes. It is programmed in C++ under Linux. and they communicate among them using Linux shared memory.
Usually, in software development, is in the final stage when the performance optimization is made. Here I came to a big problem. The software has high performance requirements, but in machines with 4 or 8 CPU cores (usually with more than one CPU), it was only able to use 3 cores, thus wasting 25% of the CPU power in the first ones, and more than 60% in the second ones. After many research, and having discarded mutex and lock contention, I found out that the time was being wasted on shmdt/shmat calls (detach and attach to shared memory segments). After some more research, I found out that these CPUs, which usually are AMD Opteron and Intel Xeon, use a memory system called NUMA, which basically means that each processor has its fast, "local memory", and accessing memory from other CPUs is expensive.
After doing some tests, the problem seems to be that the software is designed so that, basically, any process can pass shared memory segments to any other process, and to any thread in them. This seems to kill performance, as process are constantly accessing memory from other processes.
Question:
Now, the question is, is there any way to force pairs of processes to execute in the same CPU?. I don't mean to force them to execute always in the same processor, as I don't care in which one they are executed, altough that would do the job. Ideally, there would be a way to tell the kernel: If you schedule this process in one processor, you must also schedule this "brother" process (which is the process with which it communicates through shared memory) in that same processor, so that performance is not penalized.
I think you may be able to start with these manual pages:
$ apropos affinity
sched_getaffinity (2) - set and get a process's CPU affinity mask
sched_setaffinity (2) - set and get a process's CPU affinity mask
taskset (1) - retrieve or set a process's CPU affinity
$
depending on whether you want to do that from the source code or the shell. The pthread library also has some function.
In C what you are looking for is most probably the sched_setaffinity()
system call.
There is also the schedtool command-line utility if you do not want to (or cannot) modify your code.
Writing NUMA aware apps is a little bit more than just 'two processes run on same CPU'. NUMA awareness permeates everything: memory allocation, IO completion, thread scheduling etc.
Have a look at libnuma
来源:https://stackoverflow.com/questions/4664668/how-to-force-two-process-to-run-on-the-same-cpu