Why does this code deadlock?

烈酒焚心 提交于 2019-12-04 14:38:59

I'm not sure, this is a real reason, but your code contains some serious errors.

First in while (count != CPU_COUNT);. You must not read shared variable without holding a lock, unless read is atomic. With count it isn't guaranteed to be.

You must protect read of count with lock. You can replace your while-loop with following:

unsigned long local_count;
do {
    spin_lock(&lock);
    local_count = count;
    spin_unlock(&lock);
} while (local_count != CPU_COUNT);

Alternatively, you could use atomic types. Notice absence of locking

atomic_t count = ATOMIC_INIT(0);

...

void thread_sync() {
    atomic_inc(&count);
    while (atomic_read(&count) != CPU_COUNT);
}

Second problem with interrupts. I think, you don't understand what you are doing.

local_irq_save() saves and disables interrupts. Then, you disable interrupts again with local_irq_disable(). After some work, you restore previous state with local_irq_restore(), and enable interrupts with local_irq_enable(). This enabling is totally wrong. You enable interrupts, regardless of theirs previous state.

Third problem. If main thread isn't binded to a cpu, you should not use smp_processor_id() unless you are sure that kernel will not reschedule right after you get a cpu number. It's better to use get_cpu(), which disables kernel preemption and then returns cpu id. When done, call put_cpu().

But, when you call get_cpu(), this is a bug to create and run other threads. That's why you should set affinity of main thread.

Fourth. local_irq_save() and local_irq_restore() macros that takes a variable, not a pointer to unsigned long. (I've got an error and some warnings passing pointers. I wonder how did you compile your code). Remove referencing

The final code is available here: http://pastebin.com/Ven6wqWf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!