Linux not respecting SCHED_FIFO priority ? ( normal or GDB execution )

前端 未结 3 1607
清歌不尽
清歌不尽 2021-01-24 01:08

TL;DR

On multiprocessors/multicores engines, more than one RT SCHED_FIFO threads may be scheduled on more than one execution unit. So thread wit

相关标签:
3条回答
  • 2021-01-24 01:08

    I tried many solutions but never got 'No defect' code. See also my other answer in this post

    The code with the best rate,but not perfect is the one below with the traditionnal pthread C language that allow to create the thread with the right attributes right from the start.

    I am still astonished to see that I still get error even with this code (same as Question MCVE but with pure pthread... API ).

    In order to stress the code I found the following sequence

    $ seq 1000 | parallel ./main | grep inf
    Result: inf
    Result: inf
    ....
    

    inf denoting the wrong division by 0 result. Defect is in my case around 10/1000.

    Command like for i in {1..1000}; do ./main ; done | grep inf are longer

    Threads are launched from higher priority to lower priority

    So now the divisor thread

    • is created first
    • with higher RT priority (2 > 1 > main stay with SCHED_OTHER non RT scheduling).

    So I wonder why I still get division by 0 ...

    At last I tried to reduce the taskset. It runs OK when

    $ taskset -pc 0 $$
    pid 2414's current affinity list: 0,1
    pid 2414's new affinity list: 0
    $ for i in {1..1000}; do ./main_oss ; done   <<-- no need for parallel in this case
    Result: 0.333333
    Result: 0.333333
    Result: 0.333333
    Result: 0.333333
    Result: 0.333333
    ...
    

    but once there are more than 1 CPU the defect comes back

    $ taskset -pc 0,1 $$
    pid 2414's current affinity list: 0
    pid 2414's new affinity list: 0,1
    $ seq 1000 | parallel ./main_oss
    Result: 0.333333          | <<-- display by group of 2
    Result: 0.333333          |
    Result: inf             |   <<--
    Result: 0.333333        |
    ...
    

    Why do we run lower priority RT SCHED_FIFO thread on another CPU when the thread belongs to the same parent process = ?

    Unfortunately PTHREAD_SCOPE_PROCESS is not supported on Linux

    #include <iostream>
    #include <thread>
    #include <cstring>
    #include <pthread.h>
    
    double a = 1.0F;
    double b = 0.0F;
    
    void * ratio(void*)
    {
        std::cout << "Result: " << a/b << "\n" << std::flush;
        return nullptr;
    }
    
    void * divisor(void*)
    {
        b = 3.0F;
        std::this_thread::sleep_for(std::chrono::milliseconds(500u));
        return nullptr;
    }
    
    
    int main(int agrc, char * argv[])
    {
        struct sched_param param;
    
        pthread_t thr[2];
        pthread_attr_t attr;
        pthread_attr_init(&attr);
        pthread_attr_setschedpolicy(&attr,SCHED_FIFO);
        pthread_attr_setinheritsched(&attr,PTHREAD_EXPLICIT_SCHED);
    
        param.sched_priority = 2;
        pthread_attr_setschedparam(&attr,&param);
        pthread_create(&thr[0],&attr,divisor,nullptr);
    
        param.sched_priority = 1;
        pthread_attr_setschedparam(&attr,&param);
        pthread_create(&thr[1],&attr,ratio,nullptr);  
    
        pthread_join(thr[0],nullptr);
        pthread_join(thr[1],nullptr);
    
        return 0;
    } 
    
    0 讨论(0)
  • 2021-01-24 01:13

    A new answer to gather the remaining problems I had for Debugging.

    Answers like Setting application affinity in gdb / Markus Ahlberg or questions like gdb don't break when I use exec-wrapper script to exec my target binary gave a solution with the use of the GDB option exec-wrapper but then I was not (always) able to set breakpoints in my code (even trying my own wrapper)

    I finally came back to this solution again Setting application affinity in gdb / Craig Scratchley

    The initial problem

    $ ./main
    Result: inf
    

    The solution for run-time

    taskset -c 0 ./main
    Result: 0.333333
    

    But for debug

    gdb -ex 'set exec-wrapper taskset -c 0' ./main
    --> mixed result depending on conditions (native/virtualized ? Number of cores ? ) 
    sometimes 0.333333 sometimes inf
    --> problem to set breakpoints
    --> still work to do for me to summarize this issue
    

    or

    taskset -c 0 gdb main
    ...
    (gdb) r
    ...
    Result: inf
    

    and finally

    taskset -c N chrt 99 gdb main <<-- where N is a core number (*)
    ...                           <<-- 99 denotes here "your higher prio in your system"
    (gdb) r
    ...
    Result: 0.333333
    
    • I wrote N above because if your program main sets it's affinity to processor M and you set gdb affinity to N, you may get trouble the same original problem
    • I wrote only chrt 99 for GDB even if I am interested in SCHED_FIFO and not SCHED_RR because I experienced gdb ( or IDE see below ) freezes if option -f ( for fifo ) was used. I suspect the roud robin mechanism is safer as a thread will always release at some point

    And if you have an IDE (but do not know how to set gdb properly inside this IDE) I was able to do

    taskset -c N chrt 99 code
    
    0 讨论(0)
  • 2021-01-24 01:33

    There are a few things obviously wrong with your MCVE:

    1. You have a data race on b, i.e. undefined behavior, so anything can happen.

    2. You are expecting that the divisor thread will have finished pthread_setschedparam call before the ratio thread gets to computing the ratio.

      But there is absolutely no guarantee that the first thread will not run to completion long before the second thread is even created.

      Indeed that is what's likely happening under GDB: it must trap thread creation and destruction events in order to keep track of all the threads, and so thread creation under GDB is significantly slower than outside of it.

    To fix the second problem, add a counting semaphore, and have both threads randevu after each executed the pthread_setschedparam call.

    0 讨论(0)
提交回复
热议问题