How to control which core a process runs on?

后端 未结 9 653
挽巷
挽巷 2020-12-04 09:52

I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of com

相关标签:
9条回答
  • 2020-12-04 10:09

    The OS knows how to do this, you do not have to. You could run into all sorts of issues if you specified which core to run on, some of which could actually slow the process down. Let the OS figure it out, you just need to start the new thread.

    For example, if you told a process to start on core x, but core x was already under a heavy load, you would be worse off than if you had just let the OS handle it.

    0 讨论(0)
  • 2020-12-04 10:13

    Nothing tells core "now start running this process".

    The core does not see process, it only knows about executable code and various running levels and associated limitations to instructions that can be executed.

    When computer boots, for sake of simplicity only one core/processor is active and actually runs any code. Then if OS is MultiProcessor capable, it activates other cores with some system specific instruction, other cores most likely pick up from exactly same spot as other core and run from there.

    So what scheduler does is it looks through OS internal structures (task/process/thread queue) and picks one and marks it as running at its core. Then other scheduler instances running on other cores won't touch it until the task is in waiting state again (and not marked as pinned to specific core). After task is marked as running, scheduler executes switch to userland with task resuming at the point it was previously suspended.

    Technically there is nothing whatsoever stopping cores from running exact same code at exact same time (and many unlocked functions do), but unless code is written to expect that, it will probably piss all over itself.

    Scenario goes weirder with more exotic memory models (above assumes "usual" linear single working memory space) where cores don't necessarily all see same memory and there may be requirements on fetching code from other core's clutches, but it's much easier handled by simply keeping task pinned to core (AFAIK Sony PS3 architecture with SPU's is like that).

    0 讨论(0)
  • 2020-12-04 10:19

    I don't know the assembly instructions. But the windows API function is SetProcessAffinityMask. You can see an example of something I cobbled together a while ago to run Picasa on only one core

    0 讨论(0)
  • 2020-12-04 10:20

    As others have mentioned, processor affinity is Operating System specific. If you want to do this outside the confines of the operating system, you're in for a lot of fun, and by that I mean pain.

    That said, others have mentioned SetProcessAffinityMask for Win32. Nobody has mentioned the Linux kernel way to set processor affinity, and so I shall. You need to use the sched_setaffinity(2) system call. Here's a nice tutorial on how.

    The command-line wrapper for this system call is taskset(1). e.g.
    taskset -c 2,3 perf stat awk 'BEGIN{for(i=0;i<100000000;i++){}}' restricts that perf-stat of a busy-loop to running on either of core 2 or 3 (still allowing it to migrate between cores, but only between those two).

    0 讨论(0)
  • 2020-12-04 10:22

    Linux sched_setaffinity C minimal runnable example

    In this example, we get the affinity, modify it, and check if it has taken effect with sched_getcpu().

    main.c

    #define _GNU_SOURCE
    #include <assert.h>
    #include <sched.h>
    #include <stdbool.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    void print_affinity() {
        cpu_set_t mask;
        long nproc, i;
    
        if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
            perror("sched_getaffinity");
            assert(false);
        }
        nproc = sysconf(_SC_NPROCESSORS_ONLN);
        printf("sched_getaffinity = ");
        for (i = 0; i < nproc; i++) {
            printf("%d ", CPU_ISSET(i, &mask));
        }
        printf("\n");
    }
    
    int main(void) {
        cpu_set_t mask;
    
        print_affinity();
        printf("sched_getcpu = %d\n", sched_getcpu());
        CPU_ZERO(&mask);
        CPU_SET(0, &mask);
        if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
            perror("sched_setaffinity");
            assert(false);
        }
        print_affinity();
        /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
        printf("sched_getcpu = %d\n", sched_getcpu());
        return EXIT_SUCCESS;
    }
    

    GitHub upstream.

    Compile and run:

    gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
    ./main.out
    

    Sample output:

    sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
    sched_getcpu = 9
    sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    sched_getcpu = 0
    

    Which means that:

    • initially, all of my 16 cores were enabled, and the process was randomly running on core 9 (the 10th one)
    • after we set the affinity to only the first core, the process was moved necessarily to core 0 (the first one)

    It is also fun to run this program through taskset:

    taskset -c 1,3 ./a.out
    

    Which gives output of form:

    sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
    sched_getcpu = 2
    sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    sched_getcpu = 0
    

    and so we see that it limited the affinity from the start.

    This works because the affinity is inherited by child processes, which taskset is forking: How to prevent inheriting CPU affinity by child forked process?

    Tested in Ubuntu 16.04.

    x86 bare metal

    If you are that hardcore: What does multicore assembly language look like?

    How Linux implements it

    How does sched_setaffinity() work?

    Python: os.sched_getaffinity and os.sched_setaffinity

    See: How to find out the number of CPUs using python

    0 讨论(0)
  • 2020-12-04 10:23

    The OpenMPI project has a library to set the processor affinity on Linux in a portable way.

    Some while back, I have used this in a project and it worked fine.

    Caveat: I dimly remember that there were some issues in finding out how the operating system numbers the cores. I used this in a 2 Xeon CPU system with 4 cores each.

    A look at cat /proc/cpuinfo might help. On the box I used, it is pretty weird. Boiled down output is at the end.

    Evidently, the evenly numbered cores are on the first cpu and the oddly numbered cores are on the second cpu. However, if I remember correctly, there was an issue with the caches. On these Intel Xeon processors, two cores on each CPU share their L2 caches (I do not remember whether the processor has an L3 cache). I think that the virtual processors 0 and 2 shared one L2 cache, 1 and 3 shared one, 4 and 6 shared one and 5 and 7 shared one.

    Because of this weirdness (1.5 years back I could not find any documentation on the process numbering in Linux), I would be careful do do this kind of low level tuning. However, there clearly are some uses. If your code runs on few kinds of machines then it might be worth to do this kind of tuning. Another application would be in some domain specific language like StreamIt where the compiler could do this dirty work and compute a smart schedule.

    processor       : 0
    physical id     : 0
    siblings        : 4
    core id         : 0
    cpu cores       : 4
    
    processor       : 1
    physical id     : 1
    siblings        : 4
    core id         : 0
    cpu cores       : 4
    
    processor       : 2
    physical id     : 0
    siblings        : 4
    core id         : 1
    cpu cores       : 4
    
    processor       : 3
    physical id     : 1
    siblings        : 4
    core id         : 1
    cpu cores       : 4
    
    processor       : 4
    physical id     : 0
    siblings        : 4
    core id         : 2
    cpu cores       : 4
    
    processor       : 5
    physical id     : 1
    siblings        : 4
    core id         : 2
    cpu cores       : 4
    
    processor       : 6
    physical id     : 0
    siblings        : 4
    core id         : 3
    cpu cores       : 4
    
    processor       : 7
    physical id     : 1
    siblings        : 4
    core id         : 3
    cpu cores       : 4
    
    0 讨论(0)
提交回复
热议问题