ubuntu: sem_timedwait not waking (C)

前端 未结 6 1211
花落未央
花落未央 2021-02-06 17:01

I have 3 processes which need to be synchronized. Process one does something then wakes process two and sleeps, which does something then wakes process three and sleeps, which d

6条回答
  •  隐瞒了意图╮
    2021-02-06 17:42

    (Sorry to give a second answer but this one would be too messy to clean up just with editing)

    The answer is, I think, already in the original post for the question.

    So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

    So the scenario is:

    1. thread 2 takes too long
    2. sem3 times out in sem_timedwait
    3. thread 3 is descheduled or whatever it takes it to reach the sem_getvalue
    4. thread 2 wakes up and does its sem_post on sem3
    5. thread 3 issues its sem_getvalue and sees a 1
    6. thread 3 branches into the wrong branch and doesn't do its sem_post on sem1

    This race condition is hard to trigger, basically you have to hit the tiny time window where one thread has had a problem in waiting for the semaphore and then reads the semaphore with the sem_getvalue. The occurrence of that condition is much dependent of the environment (type of system, number of cores, load, IO interrupts) so this explains why it only occurs after hours, if not at all.

    Having the control flow depend of a sem_getvalue is generally a bad idea. The only atomic non-blocking access to a sem_t is through sem_post and sem_trywait.

    So this example code from the question has that race condition. This doesn't mean that the original problem code that gillez had, does indeed have the same race condition. Perhaps the example is just too simplistic, and still shows the same phenomenon for him.

    My guess is, in his original problem there was an unprotected sem_wait. That is a sem_wait that is only checked for its return value and not for errno in the event that it fails. EINTRs do occur on sem_wait quite naturally if the process has some IO. You have just do a do - while with check and reset of errno if you encounter a EINTR.

提交回复
热议问题